In the YZ Index Smoke evaluation, Qwen3 Max's Material Constraint score dropped from 100.00 yesterday to 71.10 today, a decline of 28.9 points.
Single-Day Score Comparison
Code Execution rose from 50.00 to 75.00, Engineering Judgment rose from 69.50 to 73.60, Task Expression fell from 96.30 to 63.80, main leaderboard score rose from 72.50 to 73.25, and Integrity Rating remained pass.
Fluctuation Source Analysis
Smoke evaluation has only 10 questions per day, 2 per dimension, so daily scores are significantly affected by question sampling. Material Constraint and Task Expression both dropped sharply, while Code Execution showed a symmetrical rise, and the overall main leaderboard score still had a small positive increase, indicating that the model's performance differences across ability dimensions are more likely due to the day's question combination rather than a systematic degradation of model capability.
If the model were truly degrading, it would typically be accompanied by a simultaneous decline in the main leaderboard score or sustained weakness across multiple dimensions. In today's data, the main leaderboard score actually rose by 0.8 points, and Engineering Judgment also increased slightly, which is inconsistent with the characteristics of genuine capability degradation.
Need for Continued Attention
Current evidence points to a higher probability of question sampling fluctuation. A single-day decline of 28.9 points in Material Constraint is within the normal range under the daily quick test framework and does not yet constitute a clear signal of model capability degradation. It is recommended to continuously observe the standard deviation of Material Constraint scores over 3-5 trading days, and consider further retesting only if the fluctuation amplitude continues to exceed 20 points.
The stability dimension of the YZ Index measures score standard deviation, not single-pass accuracy. The score change of Qwen3 Max today is more likely a manifestation of sampling randomness.
Data source: YZ Index (YZ Index) | Run #184 | View Raw Data
© 2026 Winzheng.com 赢政天下 | 转载请注明来源并附原文链接