[Work in Progress] The Anthropic Record

A documented review of safety commitments and actions

Note: This independent analysis is not affiliated with Anthropic PBC. Domain name references anthropic reasoning (philosophy), not the company. All claims documented with publicly available sources.

Executive Summary

Anthropic has systematically broken or quietly removed safety commitments from its founding through present, while maintaining public positioning as the "responsible AI company."
The evidence includes: breaking the founding promise to stay "behind the frontier," lobbying against safety legislation using tactics the bill authors called "scaremongering," quietly removing written RSP commitments when they became binding, and hiding CEO involvement in policy decisions from employees.
The pattern suggests safety commitments serve primarily to attract talent and funding rather than to constrain the capabilities race. European policymakers report Anthropic and OpenAI take identical positions on mandatory regulation.

CASE 1

The RSP Commitment

Anthropic's October 2023 Responsible Scaling Policy stated: "We commit to define ASL-4 evaluations before we first train ASL-3 models (i.e. before continuing training beyond when ASL-3 evaluations are triggered). Similarly, we commit to define ASL-5 evaluations before training ASL-4 models, and so forth."¹

This was specific and verifiable: before training models at capability level N, define the safety evaluations for level N+1. This commitment was foundational to Anthropic's claim to be "scaling responsibly."

In October 2024, this commitment was removed from RSP version 2.1: "We have decided not to maintain a commitment to define ASL-N+1 evaluations by the time we develop ASL-N models."²

The removal was not announced in blog posts or public changelogs.³ Only a single person outside Anthropic noticed, in a LessWrong comment. The public changelog mentioned only new commitments, not removed ones.

Meanwhile, Anthropic's Chief Scientist confirmed they currently "work under the ASL-3 standard,"⁴ meaning they had reached the threshold where the commitment should have applied.

When Made

Specific written promise used to demonstrate "responsible scaling"

When Due

Quietly removed; replaced with vague "plan to add information later"

When a company makes a formal written commitment and then quietly removes it when it becomes binding, what does this indicate about other commitments?

CASE 2

The RAISE Act Response

New York's RAISE Act would apply only to models trained with over $100 million in compute—a threshold that excludes virtually all companies except a handful of frontier labs.⁵

Anthropic's Head of Policy, Jack Clark, stated:

"It also appears multi-million dollar fines could be imposed for minor, technical violations - this represents a real risk to smaller companies"⁶

— Jack Clark, December 2024

This statement was false. The bill's $100M compute threshold means it applies only to frontier labs. No "smaller companies" would be affected—Jack Clark would have known this from reading the bill text.

The bill's author, New York State Senator Alex Bores, responded directly:

"An army of lobbyists are painting RAISE as a burden for startups, and this language perpetuates that falsehood. RAISE only applies to companies that are spending over $100M on compute for the final training runs of frontier models, which is a very small, highly-resourced group... it's scaremongering to suggest that the largest penalties will apply to minor infractions."⁷

— Senator Alex Bores, responding to Anthropic

The bill's author explicitly called this "scaremongering" and a "falsehood." Jack Clark made a factually false claim about who the bill would affect.

CASE 3

The Founding Promise

In conversations with major funders and potential employees during Anthropic's founding, CEO Dario Amodei made a specific commitment:

"I spent a while talking with Dario back in late October 2022 (ie. pre-RSP in Sept 2023), and we discussed Anthropic's scaling policy at some length, and I too came away with the same impression everyone else seems to have: that Anthropic's AI-arms-race policy was to invest heavily in scaling, creating models at or pushing the frontier to do safety research on, but that they would only release access to second-best models & would not ratchet capabilities up, and it would wait for someone else to do so before catching up. So it would not contribute to races but not fall behind and become irrelevant/noncompetitive."⁸

— Gwern, documenting October 2022 conversation with Dario Amodei

Early investor Dustin Moskovitz confirmed this understanding, stating "that is their policy."⁹

This promise secured early EA funding and attracted safety-focused researchers. The narrative was clear: "Unlike OpenAI, we won't push the frontier."

Early releases appeared to follow this policy. Claude-1 and Claude-2 were "second-best" models, weaker than ChatGPT-4 despite Claude-2's longer context window.¹⁰

Then the policy was abandoned. Claude 3 Opus was released as a frontier model, competitive with or exceeding GPT-4. Claude 3.5 Sonnet pushed ahead of OpenAI's offerings for coding. Recent releases explicitly market superior benchmark performance.

What changed was not the safety research findings or risk assessments, but the competitive pressure and the realization that "second-best" doesn't win market share.

CASE 4

SB-1047 and Internal Communication

California's SB-1047 represented an attempt at AI safety regulation. After the bill was significantly weakened through industry lobbying, Anthropic sent a letter to the Governor.

In Sacramento, letters supporting legislation include "Support" in the subject line as standard practice. This is how staff sort letters when advising the Governor. Anthropic's letter did not include "Support" in the subject line.¹¹

This created the appearance of supporting regulation without formally doing so in the political process.

Internally, many Anthropic employees believed the policy team was misaligned with company leadership—that lobbying against strong regulation came from policy staff, not core leadership.

"The impression that people had was 'misaligned policy team doing stuff that Dario wasn't aware of', where in reality Dario is basically the real policy & politics lead and was very involved in all the SB-1047 stuff"¹²

— Source with knowledge of internal dynamics

The CEO was personally leading lobbying efforts, but employees were allowed to maintain a false understanding of who was driving policy decisions.

CASE 5

Research Without Follow-Through

Anthropic justified research into dangerous capabilities with this reasoning:

"If alignment ends up being a serious problem, we're going to want to make some big asks of AI labs... We might ask labs or state actors to pause scaling, to do certain types of testing, or to roll out models more slowly."¹³

— Anthropic research justification, 2023

The research succeeded. Anthropic's work on alignment faking and deceptive behavior produced evidence of serious alignment problems.

The "big asks" never came. There was no call for pausing scaling, no public advocacy for mandatory testing requirements, no push for slower rollouts. The race accelerated.

SYNTHESIS

Pattern Analysis

The evidence documents a multi-year pattern across all commitment types:

Founding promise to stay "behind the frontier" (broken within ~1 year)
Public statements supporting regulation (contradicted by lobbying tactics)
Written RSP commitments (quietly removed when binding)
Research rationale for interventions (evidence found, no interventions made)
Internal transparency (employees misled about leadership's role)
Differentiation from OpenAI (European policymakers report identical lobbying positions on mandatory measures¹⁴)

Hypothesis A

Anthropic genuinely prioritizes safety but struggles with complex tradeoffs

Hypothesis B

Safety commitments serve talent retention and differentiation but are abandoned when they would constrain the race

Consider which hypothesis better explains:

Why the founding promise was abandoned for market competition
Why employees weren't told Dario was leading lobbying efforts
Why the Sacramento letter looked like support but wasn't
Why RSP commitments were removed without announcement
Why lobbying positions match OpenAI despite public differentiation
Why research produces evidence but no action follows

OpenAI is openly racing toward AGI and is transparent about prioritizing capabilities. You know what you're signing up for.

Anthropic presents itself as "the responsible alternative" while taking identical positions on mandatory regulation. European policymakers report that OpenAI and Anthropic lobbyists say the same things regarding anything mandatory.

The only difference is the marketing.

Thought Experiments

If you were advising a friend who had been offered a role at Anthropic, and they asked "Is this company actually different from OpenAI on safety?", what would you say based on the evidence above?

If you had no equity in Anthropic, no sunk costs, and no identity tied to the company, would you join today with this evidence? Would you recommend that a safety-motivated friend join?

What would it take for you to conclude that safety commitments exist primarily for talent retention and brand differentiation rather than as constraints on the capabilities race?

When the company breaks written commitments and hides leadership's policy involvement from employees, what does this suggest about whether your work advances safety or provides credibility to a race?

Sources

1. Anthropic Responsible Scaling Policy (October 2023), page 4

2. Anthropic Responsible Scaling Policy v2.1 (October 2024)

3. Midas Project: Anthropic RSP Tracking; LessWrong analysis

4. Time Magazine interview with Jared Kaplan (Anthropic Chief Scientist), October 2024

5. NY RAISE Act (A6453), New York State Senate

6. Jack Clark on X/Twitter, December 2024

7. Senator Alex Bores response on X/Twitter, December 2024

8. Gwern's LessWrong comment documenting October 2022 conversation with Dario Amodei about Anthropic's "second-best models" policy

9. Dustin Moskovitz Facebook comment (screenshot) confirming understanding of the "second-best models" policy: "that is their policy"

10. Gwern's observation in the same comment noting Claude-1 and Claude-2 releases matched the "second-best" policy, and referencing the OpenAI board conflict over Helen Toner's paper citing Anthropic as not pushing the frontier

11. Sacramento legislative advocacy practices; analysis of Anthropic's SB-1047 letter lacking "Support" in subject line (standard convention for support letters to California Governor)

12. Source with knowledge of internal communications at Anthropic regarding employee understanding of Dario's policy involvement

13. Anthropic research documentation, 2023, justifying dangerous capabilities research as building evidence for potential interventions including pausing scaling

14. Multiple European policy sources reporting identical lobbying positions between Anthropic and OpenAI on mandatory regulation measures