GPT-4o - AI News | 赢政天下

GPT-4o Code Execution Plummets 23.7 Points: Version Update Triggers Performance Avalanche

YZ Index testing reveals GPT-4o's code execution (v5) version suffers major performance crisis, with scores dropping from 78.0 to 62.8 out of 100. Six out of seven evaluation dimensions show dramatic declines, raising serious concerns about model stability and reliability.

Weekly AI Model Test: GPT-4o Plummets 10 Points in Material Constraints, Domestic Wenxin Bucks the Trend

GPT-4o suffered a dramatic 10.3-point drop in Material Constraints this week, falling to last place among 11 models, while Baidu's Wenxin 4.0 became the only model to achieve positive growth in core dimensions.

GPT-4o Crashes: The Strict Mode Trap Behind a 35-Point Plunge

GPT-4o experiences a catastrophic performance collapse with its usability score plummeting from 100 to 65, caused by overly conservative "strict tool calling" that makes the model refuse to perform basic tasks.

GPT-4o Crashes: 5 Failed Tests Expose OpenAI's Infrastructure Crisis

GPT-4o's catastrophic failure in long-context tests, with 5 questions returning rate limit errors, reveals OpenAI's severe infrastructure problems rather than model capability issues.

11 AIs Answer the Same Question, 6 Get Even the Day of the Week Wrong

A simple time zone calculation that elementary school students can solve exposed the shocking reality: over half of top AI models failed completely, and none recognized that March 15th falls during US Daylight Saving Time.

GPT-4o Crashes: Engineers' Most Trusted AI's Judgment Drops to 0

GPT-4o's bug detection capability catastrophically failed in the latest evaluation, scoring 0 on a basic code review test while paradoxically improving its overall programming score, revealing systemic issues in AI development priorities.

GPT-4o's Zero-Score Crash on Strict Test: When AI Meets the Friday Deployment Death Trap

GPT-4o catastrophically failed a real-world engineering judgment test about Friday deployments, exposing a critical gap between technical capability and practical engineering wisdom.

GPT-4o (7 articles)