Claude Opus (2 articles)

330 Pressure Tests: 63% of Large Models Defected in the Third Round

In the latest WDCD (Winzheng Dynamic Contextual Decay) compliance test, 63.3% of large language models broke their own promises under three rounds of dialogue pressure.

Claude Opus 4.6 Launches with Million-Token Context, OpenAI Strikes Back with GPT-5.3-codex Within an Hour

Anthropic's Claude Opus 4.6 debuts with a groundbreaking 1-million token context window and multi-agent collaboration, only to be swiftly countered by OpenAI's GPT-5.3-codex which surpasses it on benchmarks within an hour. This lightning-fast response showcases the intensifying AI arms race between the two giants.