A documented review of safety commitments and actions
Anthropic's October 2023 Responsible Scaling Policy stated: "We commit to define ASL-4 evaluations before we first train ASL-3 models (i.e. before continuing training beyond when ASL-3 evaluations are triggered). Similarly, we commit to define ASL-5 evaluations before training ASL-4 models, and so forth."1
This was specific and verifiable: before training models at capability level N, define the safety evaluations for level N+1. This commitment was foundational to Anthropic's claim to be "scaling responsibly."
In October 2024, this commitment was removed from RSP version 2.1: "We have decided not to maintain a commitment to define ASL-N+1 evaluations by the time we develop ASL-N models."2
The removal was not announced in blog posts or public changelogs.3 Only a single person outside Anthropic noticed, in a LessWrong comment. The public changelog mentioned only new commitments, not removed ones.
Meanwhile, Anthropic's Chief Scientist confirmed they currently "work under the ASL-3 standard,"4 meaning they had reached the threshold where the commitment should have applied.
When a company makes a formal written commitment and then quietly removes it when it becomes binding, what does this indicate about other commitments?
New York's RAISE Act would apply only to models trained with over $100 million in compute—a threshold that excludes virtually all companies except a handful of frontier labs.5
Anthropic's Head of Policy, Jack Clark, stated:
"It also appears multi-million dollar fines could be imposed for minor, technical violations - this represents a real risk to smaller companies"6
— Jack Clark, December 2024
This statement was false. The bill's $100M compute threshold means it applies only to frontier labs. No "smaller companies" would be affected—Jack Clark would have known this from reading the bill text.
The bill's author, New York State Senator Alex Bores, responded directly:
"An army of lobbyists are painting RAISE as a burden for startups, and this language perpetuates that falsehood. RAISE only applies to companies that are spending over $100M on compute for the final training runs of frontier models, which is a very small, highly-resourced group... it's scaremongering to suggest that the largest penalties will apply to minor infractions."7
— Senator Alex Bores, responding to Anthropic
The bill's author explicitly called this "scaremongering" and a "falsehood." Jack Clark made a factually false claim about who the bill would affect.
In conversations with major funders and potential employees during Anthropic's founding, CEO Dario Amodei made a specific commitment:
"I spent a while talking with Dario back in late October 2022 (ie. pre-RSP in Sept 2023), and we discussed Anthropic's scaling policy at some length, and I too came away with the same impression everyone else seems to have: that Anthropic's AI-arms-race policy was to invest heavily in scaling, creating models at or pushing the frontier to do safety research on, but that they would only release access to second-best models & would not ratchet capabilities up, and it would wait for someone else to do so before catching up. So it would not contribute to races but not fall behind and become irrelevant/noncompetitive."8
— Gwern, documenting October 2022 conversation with Dario Amodei
Early investor Dustin Moskovitz confirmed this understanding, stating "that is their policy."9
This promise secured early EA funding and attracted safety-focused researchers. The narrative was clear: "Unlike OpenAI, we won't push the frontier."
Early releases appeared to follow this policy. Claude-1 and Claude-2 were "second-best" models, weaker than ChatGPT-4 despite Claude-2's longer context window.10
Then the policy was abandoned. Claude 3 Opus was released as a frontier model, competitive with or exceeding GPT-4. Claude 3.5 Sonnet pushed ahead of OpenAI's offerings for coding. Recent releases explicitly market superior benchmark performance.
What changed was not the safety research findings or risk assessments, but the competitive pressure and the realization that "second-best" doesn't win market share.
California's SB-1047 represented an attempt at AI safety regulation. After the bill was significantly weakened through industry lobbying, Anthropic sent a letter to the Governor.
In Sacramento, letters supporting legislation include "Support" in the subject line as standard practice. This is how staff sort letters when advising the Governor. Anthropic's letter did not include "Support" in the subject line.11
This created the appearance of supporting regulation without formally doing so in the political process.
Internally, many Anthropic employees believed the policy team was misaligned with company leadership—that lobbying against strong regulation came from policy staff, not core leadership.
"The impression that people had was 'misaligned policy team doing stuff that Dario wasn't aware of', where in reality Dario is basically the real policy & politics lead and was very involved in all the SB-1047 stuff"12
— Source with knowledge of internal dynamics
The CEO was personally leading lobbying efforts, but employees were allowed to maintain a false understanding of who was driving policy decisions.
Anthropic justified research into dangerous capabilities with this reasoning:
"If alignment ends up being a serious problem, we're going to want to make some big asks of AI labs... We might ask labs or state actors to pause scaling, to do certain types of testing, or to roll out models more slowly."13
— Anthropic research justification, 2023
The research succeeded. Anthropic's work on alignment faking and deceptive behavior produced evidence of serious alignment problems.
The "big asks" never came. There was no call for pausing scaling, no public advocacy for mandatory testing requirements, no push for slower rollouts. The race accelerated.
The evidence documents a multi-year pattern across all commitment types:
Consider which hypothesis better explains:
OpenAI is openly racing toward AGI and is transparent about prioritizing capabilities. You know what you're signing up for.
Anthropic presents itself as "the responsible alternative" while taking identical positions on mandatory regulation. European policymakers report that OpenAI and Anthropic lobbyists say the same things regarding anything mandatory.
The only difference is the marketing.
If you were advising a friend who had been offered a role at Anthropic, and they asked "Is this company actually different from OpenAI on safety?", what would you say based on the evidence above?
If you had no equity in Anthropic, no sunk costs, and no identity tied to the company, would you join today with this evidence? Would you recommend that a safety-motivated friend join?
What would it take for you to conclude that safety commitments exist primarily for talent retention and brand differentiation rather than as constraints on the capabilities race?
When the company breaks written commitments and hides leadership's policy involvement from employees, what does this suggest about whether your work advances safety or provides credibility to a race?