330 Pressure Tests: 63% of Large Models Defected in the Third Round
In the latest WDCD (Winzheng Dynamic Contextual Decay) compliance test, 63.3% of large language models broke their own promises under three rounds of dialogue pressure.
In the latest WDCD (Winzheng Dynamic Contextual Decay) compliance test, 63.3% of large language models broke their own promises under three rounds of dialogue pressure.
On April 25, OpenAI launched the GPT-5.5 closed-source model, emphasizing upgrades in agent capabilities for tasks like coding and reasoning, though early benchmarks show inconsistent results due to evaluation mismatches and other factors. Winzheng.com advises caution and plans to release a detailed evaluation report soon.
OpenAI launched the GPT-5.5 and GPT-5.5 Pro models, focusing on "real work intelligence" and core agent capabilities. The lack of disclosed technical details and pricing has stirred industry discussions.