模型暴跌 (1 articles)

Grok 4 Plunges 25 Points in Execution Meltdown! Claude Opus Tops AI Daily Review with 89.43 Points

In today's Smoke lightweight benchmark (2026-05-13), Claude Opus leads steadily at 89.43 points, while Grok 4 and GPT-o3 suffer collective execution collapses—Grok 4 drops 25.2 points on the main leaderboard, with execution falling from 100 to 50, and GPT-o3 drops 23.1 points with execution halved.