Takeaways from the Anthropic cyberattack report

Anthropic put out a report late last week about how they discovered and aborted a large-scale AI cyberattack. It's worth reading.

This isn't the first time they've reported this sort of misuse. But it's the most practical real-world example of it. A Chinese state-sponsored organization used Claude Code to do all the core technical work of hacking ~30 global targets. These were real targets: government agencies, tech companies, financial institutions. Some of the attacks were successful.

And AI automated 80-90% of the attack. It's not entirely clear how that number translates to actual labor savings, but the report implies it is significant.

Two things that stood out to me:

  1. For now, Claude seems to still have a substantial capability advantage over open models on long time horizon agentic tasks.
  2. The attackers made material sacrifices to avoid triggering Claude's safeguards. The report mentions that the attackers had to intentionally limit the context available to Claude in order to avoid refusals. A guardrail-free model would remove this constraint and would be a straightforward way for attackers to increase the efficacy of their attacks.

I expect many similar cases have already happened in the wild but gone undetected.