On July 6, 2026, the YZ Index Smoke Quick Test covered 11 models, with Doubao Pro leading the day at 83.91 points. The Smoke test consists of 10 quick daily questions, suitable for observing short-term signals, not equivalent to Full weekly rankings.
This Smoke evaluation only covers two main dimensions: Code Execution and Material Constraints. The Main Board formula is 0.55 × Code Execution + 0.45 × Material Constraints. Due to the small daily sample size, single-day scores are better used as monitoring signals rather than long-term assessments of model capabilities.
Daily Rankings
| Rank | Model | Main Board | Code Execution | Material Constraints | Integrity |
|---|---|---|---|---|---|
| #1 | 豆包 Pro | 83.91 | 75 | 94.8 | pass |
| #2 | GPT-5.5 | 80.45 | 75 | 87.1 | pass |
| #3 | DeepSeek V4 Pro | 79.35 | 66.7 | 94.8 | pass |
| #4 | Gemini 3.1 Pro | 79.35 | 66.7 | 94.8 | pass |
| #5 | Grok 4 | 79.35 | 66.7 | 94.8 | pass |
| #6 | Claude Sonnet 4.6 | 71.51 | 50 | 97.8 | pass |
| #7 | Claude Opus 4.7 | 70.16 | 50 | 94.8 | pass |
| #8 | GPT-o3 | 70.16 | 50 | 94.8 | pass |
| #9 | Qwen3 Max | 70.16 | 50 | 94.8 | warn |
| #10 | Gemini 2.5 Pro | 67.3 | 44.8 | 94.8 | pass |
| #11 | GLM-4.6 | 38.75 | 50 | 25 | pass |
Data Interpretation
In today's YZ Index Smoke Quick Test, Doubao Pro topped the Main Board with 83.91, pairing a Code Execution score of 75 with a Material Constraints score of 94.8, forming a balanced structure. GPT-5.5 scored 80.45 on the Main Board, also with Code Execution at 75 but Material Constraints at 87.1, showing a relatively narrower performance on material constraints. DeepSeek V4 Pro, Gemini 3.1 Pro, and Grok 4 all tied at 79.35 on the Main Board, each with Code Execution at 66.7 and Material Constraints at 94.8, reflecting a similar pattern of weaker code execution and stronger material constraints. Claude Sonnet 4.6 scored 71.51 on the Main Board, with Code Execution at 50 and Material Constraints at 97.8, highlighting a notable weakness in code execution despite a material constraint advantage.
GLM-4.6's Main Board score dropped by 21.3 points compared to the previous run under the same methodology, with Code Execution falling 38.7 points and Integrity changing from fail to pass. Gemini 2.5 Pro's Main Board fell by 16 points, Code Execution dropped 42.7 points, and Material Constraints rose 16.6 points. GPT-o3's Main Board declined by 9.6 points, Code Execution dropped 22 points, and Material Constraints rose 5.5 points. These changes occurred in a single-day small-sample test and may result from question sampling fluctuations or reflect actual model performance degradation, requiring confirmation in subsequent runs.
Overall, top models show varying strengths and weaknesses between Code Execution and Material Constraints. Claude Opus 4.7 and GPT-o3 both scored 70.16 on the Main Board, with Code Execution at 50 and Material Constraints at 94.8. Qwen3 Max also scored 70.16 on the Main Board but with an Integrity warning. As a small-sample single-day indicator, the Smoke Quick Test data is for daily reference only and should not be used for long-term judgments.
Key Changes
- GLM-4.6: Main Board -21.3, Code Execution -38.7, Integrity fail→pass
- Gemini 2.5 Pro: Main Board -16, Code Execution -42.7, Material Constraints +16.6
- GPT-o3: Main Board -9.6, Code Execution -22, Material Constraints +5.5
- Gemini 3.1 Pro: Main Board -9.2, Code Execution -30.3, Material Constraints +16.6
- Claude Sonnet 4.6: Main Board -8.3, Code Execution -22, Material Constraints +8.5
Signals to Watch
- No publishable anomaly signals were retained in this run.
When reading such Smoke briefs, the focus should be on two questions: first, whether a model exposes the same type of weakness for several consecutive days; second, whether the Integrity rating shifts from pass to warn or fail. Large daily changes in execution or constraint scores may result from question sampling or could be early signals of actual degradation, requiring verification in subsequent runs.
Data source: 赢政指数 (YZ Index) | Run #215 | View raw data
© 2026 Winzheng.com 赢政天下 | 转载请注明来源并附原文链接