Claude Opus 4.7 and GPT-5.5 Tie for First on Smoke Leaderboard; Material Constraint Becomes the Biggest Differentiator

Claude Opus 4.7 and GPT-5.5 Tie for First on Smoke Leaderboard; Material Constraint Becomes the Biggest Differentiator

Smoke's lightweight evaluation today shows that Claude Opus 4.7 and GPT-5.5 tie for first on the main leaderboard with a score of 92.53, both scoring 100 in code execution and 83.4 in material constraint. This result directly thrusts material constraint into the spotlight.

Material Constraint Widens the Gap in the Second Tier

Third-placed Claude Sonnet 4.6 trails by only 0.4 points, mainly due to its material constraint score of 82.5. Doubao Pro and Gemini 2.5 Pro tie for fourth (91.68 points), with a constraint score of 81.5, widening the gap from 0.4 to 1.9 points. Under the formula 0.55 × Code Execution + 0.45 × Material Constraint, each 1-point increase in constraint contributes 0.45 points to the total score, far exceeding the marginal effect of the execution dimension.

Perfect Execution Score Has Become the Baseline

The top nine models all score 100 in code execution, while Grok 4 and Wenxin Yiyan stop at 50. The latter receives a material constraint score of 70.5 with a warn, indicating a significant deviation in following the original material instructions. As execution capabilities converge, the real competition among models shifts to the ability to "stay on track given the material."

No abnormal signals today; all models' scores are consistent with yesterday's, and the stability dimension shows no significant fluctuations. Grok 4's 63.41 points are mainly due to an execution crash, not a constraint issue; Wenxin Yiyan, on the other hand, is low on both dimensions, and the warn label further confirms its consistency risk.

Industry Implications

By mid-2026, top models have pushed code execution close to the ceiling, and the next phase of competition will inevitably shift to material constraint. Claude and GPT-5.5 currently hold a slim 0.9-point lead in this dimension, enough to create a dramatic tie for first on the main leaderboard. If constraint scores continue to diverge, the leaderboard will move from "tie" to "breakaway."

Material constraint has become the new moat.


Data source: YZ Index (YZ Index) | Run #153 | View raw data