[A Note on Anthropic Reasoning] If the future is vast compared to the past, why are we at the beginning? A grim answer: AI will literally kill everyone in the near future. It is ironic, then, that one of the companies that race towards superhuman AI calls itself Anthropic. It bears a unique burden of scrutiny given that it recruits safety-conscious researchers to contribute to the race, tries to appear to champion safety, and its CEO says there's 10-25% chance their product will kill everyone on the planet. // This is a non-commercial, critical commentary not affiliated with Anthropic PBC.

[SYSTEM_ANALYSIS_INITIATED]

[ SCROLL DOWN ]

[PROMISE]

We do not wish to advance the rate of AI capabilities progress.

[LOG::PROMISE_SCALING_2023]

"We generally don’t publish this kind of work because we do not wish to advance the rate of AI capabilities progress. [...] We've subsequently begun deploying Claude now that the gap between it and the public state of the art is smaller."

- Anthropic, on Responsible Scaling

[LOG::REALITY_CLAUDE3_2024]

"Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems... with best-in-market performance on highly complex tasks."

- Anthropic, Claude 3 Launch

[LOG::HYPOTHESIS_STEERING]

This contradiction is rationalized by a core hypothesis:

"Our hypothesis is that being at the frontier of AI development is the most effective way to steer its trajectory. This position enables us to push for impactful regulation..."

- Anthropic, Core Views on AI Safety

[ANALYSIS::STEERING_VS_ACTION_NY_RAISE]

Test Case: NY RAISE Act. Anthropic's feedback focused on claims the bill's author called "scaremongering," such as suggesting that multi-million dollar fines for minor violations posed a "real risk to smaller companies."

This ignores that the bill only applies to models trained with over $100M in compute, a class that excludes virtually all "smaller companies."

[ANALYSIS::SB1047]

Test Case: CA SB-1047. A similar pattern emerged. The bill was significantly watered down after lobbying, yet Anthropic still did not support it.

Furthermore, internally, the involvement of Dario Amodei in these lobbying efforts was significantly downplayed within the company.

[ANALYSIS::INACTION_VS_RESEARCH_RATIONALE]

In 2023, research into dangerous capabilities (like alignment faking) was justified by stating the need for evidence to make "big asks":

"If alignment ends up being a serious problem, we’re going to want to... ask AI labs or state actors to pause scaling..."

With strong scientific evidence of these risks now available, the "big asks" have not been made. The race has only accelerated.

[FINAL_ANALYSIS::CALL_TO_ACTION]

The pattern is clear: public safety commitments are used to attract talent and build trust, while actions in the halls of power and in the product roadmap prioritize capabilities competition.

The difference between Anthropic's lobbyists and OpenAI's is that publicly, Anthropic talks about alignment. When they talk to governments, there's little difference.

Employees must ask: is my work enabling safety, or providing a respectable veneer for a dangerous race to the bottom?

[REALITY]

Best-in-market performance on highly complex tasks.

Model Organisms of Contradiction

When Anthropic's stated objectives and its observed behavior are in direct conflict, its alignment must be questioned.