大模型评测 - AI News

330 Pressure Tests: 63% of Large Models Defected in the Third Round

In the latest WDCD (Winzheng Dynamic Contextual Decay) compliance test, 63.3% of large language models broke their own promises under three rounds of dialogue pressure.

OpenAI Officially Releases GPT-5.5 with Enhanced Agent Capabilities; Early Benchmark Test Results Are Mixed

On April 25, OpenAI launched the GPT-5.5 closed-source model, emphasizing upgrades in agent capabilities for tasks like coding and reasoning, though early benchmarks show inconsistent results due to evaluation mismatches and other factors. Winzheng.com advises caution and plans to release a detailed evaluation report soon.

OpenAI Officially Releases GPT-5.5 Series on April 24, Technical Details and Pricing Undisclosed, Sparking Discussions

OpenAI launched the GPT-5.5 and GPT-5.5 Pro models, focusing on "real work intelligence" and core agent capabilities. The lack of disclosed technical details and pricing has stirred industry discussions.

大模型评测 (3 articles)

330 Pressure Tests: 63% of Large Models Defected in the Third Round

OpenAI Officially Releases GPT-5.5 with Enhanced Agent Capabilities; Early Benchmark Test Results Are Mixed

OpenAI Officially Releases GPT-5.5 Series on April 24, Technical Details and Pricing Undisclosed, Sparking Discussions