In the June 2026 YZ Index real-world test of 11 models, GPT-o3's Smoke evaluation material constraint score dropped from yesterday's 100.00 to 84.80, and the main board overall dropped from 100.00 to 93.16.
Single-Day Data Breakdown
The code execution dimension remained unchanged at 100.00, and engineering judgment and task expression also maintained full marks. Only the material constraint saw a -15.2 point drop, directly dragging the main board down by 6.8 points. The integrity rating remained pass, not triggering any threshold.
Source of Volatility Analysis
The Smoke evaluation consists of only 10 questions per day, with 2 questions per dimension. The small sample size makes the daily standard deviation prone to amplification. If a material constraint question involves a boundary case or requires strict source citation, a single failure by the model can cause a drop of 15 points or more. Such fluctuations have occurred multiple times in historical rapid tests of this kind and typically rebound the next day.
Another possibility is genuine model degradation. If recent parameter updates or alignment strategy adjustments have affected citation accuracy, the material constraint decline could persist for several days. However, based on only one day of data, the trend cannot be confirmed yet.
Should We Be Concerned?
From an engineering perspective, the two core capabilities—code execution and engineering judgment—remain unaffected, and the main board still outperforms most competitors. It is recommended to observe the same dimension's score for three consecutive days. If the material constraint consistently stays below 90, then initiate an in-depth retest. A single-day anomaly alone does not constitute evidence of a capability inflection point for the model.
Currently, the probability of sampling fluctuation is higher, and evidence of genuine degradation is insufficient.
A 15-point plunge is more likely a result of the 10-question lottery sampling rather than a model collapse.
Data source: YZ Index (YZ Index) | Run #187 | View raw data
© 2026 Winzheng.com 赢政天下 | 转载请注明来源并附原文链接