GPT-5.5 Tops Smoke Chart with Material Constraint Score of 71, All Models Get Full Code Score but Gap Widens in Second Half

Jun 2, 2026 518 Views - Read Source Winzheng Index

GPT-5.5 Material Constraints Smoke Test Code Execution 模型分化

The most direct finding from today's Smoke lightweight benchmark is that code execution ability is no longer a differentiating factor among the top seven models. All models scored 100, and rankings were entirely determined by material constraint scores.

The True Ranking Logic Under Full Code Scores

In the scoring formula, code execution carries a weight of 0.55, while material constraint carries 0.45. Currently, the top seven models all achieved full marks in code execution, while material constraint scores dropped from 71 (GPT-5.5) to 55 (DeepSeek V4 Pro), directly widening the gap on the main leaderboard. GPT-5.5 achieved an overall score of 86.95 with its constraint score of 71, while the second-place GPT-o3 had a constraint score of only 66.8, trailing by nearly 2 points.

This phenomenon indicates that mainstream models in 2026 have generally reached a high level in code execution tasks, and the next phase of competition has shifted to the ability to strictly follow user instructions and context.

Hard Hits for Models in the Lower Half

Claude Opus 4.7,<|eos|>

Data source: YZ Index | Run #143 | View raw data

GPT-5.5 Tops Smoke Chart with Material Constraint Score of 71, All Models Get Full Code Score but Gap Widens in Second Half

The True Ranking Logic Under Full Code Scores

Hard Hits for Models in the Lower Half

Related Reviews

Winzheng Index Claude Opus 4.7 Smoke Evaluation Main Ranking Drops 26.1 Points, Code Execution and Material Constraints Both Fail

Winzheng Index Gemini 2.5 Pro Code Execution Dropped 24.6 Points in a Single Day; Overall Ranking Slid 6.5 Points

Winzheng Index Claude Opus 4.7 drops 14 points on main leaderboard, Code Execution falls from 100 to 69

Winzheng Index Qwen3 Max Main Board Plunges 12.9 Points, Gemini 2.5 Pro Leads Smoke Lite List with 96.99 Points