Claude Opus 4.7 Leads with 97.12 Points, Perfect Execution but Material Constraint Score of 93.6 Drags Down Overall

Jun 27, 2026 17 Views - Read Source Winzheng Index

Claude Opus 4.7 Code Execution Smoke Light Test Material Constraints 模型结构分析

In the YZ Index from June 27, 2026, Smoke lightweight evaluation, Claude Opus 4.7 ranked first on the main leaderboard with 97.12 points, achieving a perfect 100 points in code execution and 93.6 points in material constraint.

Perfect Execution with a Constraint Shortcoming

Claude Opus 4.7 scored 100 points in code execution and 93.6 points in material constraint. Calculated using the formula 0.55 × execution + 0.45 × constraint, its main leaderboard score was 97.12. Claude Sonnet 4.6 also scored 100 points in execution, with a constraint score of 92.1, resulting in a main leaderboard score of 96.45. Both models have peaked in the execution dimension, but achieved only 93.6 and 92.1 points in the constraint dimension respectively, directly lowering their overall scores.

A Different Combination of Execution and Constraint in the Mid-tier Group

Three models—豆包 Pro, Gemini 3.1 Pro, and GPT-5.5—tied for 83.37 points on the main leaderboard, each scoring 75 points in execution and 93.6 points in constraint. This structure shows that they match Claude Opus 4.7 in material constraint but lag by 25 points in code execution, resulting in a 13.75-point disadvantage on the main leaderboard.

DeepSeek V4 Pro scored 82.16 points on the main leaderboard, with 75 points in execution and 90.9 in constraint. GPT-o3 scored 81.84 points, with 75 points in execution and 90.2 in constraint. Both models have constraint scores below the 93.6-point range, further widening the gap to the top five.

Ranking Changes Due to Declining Execution Scores

Compared to yesterday, 文心一言 4.5 dropped 23.8 points on the main leaderboard, with execution down 37.5 points and constraint down 7 points from yesterday's level. Gemini 2.5 Pro dropped 22.6 points on the main leaderboard, with execution also down 37.5 points. Qwen3 Max saw a 41.2-point decline in execution and a 22.6-point drop on the main leaderboard. DeepSeek V4 Pro's execution fell by 25 points, and its main leaderboard score dropped by 15.1 points. Grok 4's execution declined by 27.5 points, with a 15.1-point drop on the main leaderboard. The collective decline in execution scores for these models is the main reason for their downward shift in today's ranking.

Constraint Dimension Relatively Stable

Today, all models passed the integrity rating for material constraint. Qwen3 Max scored 95.9 points in constraint, the only model exceeding 95 points, but its execution was only 58.8 points, resulting in a main leaderboard score of 75.5 points. Gemini 2.5 Pro scored 91.4 points in constraint, and 文心一言 4.5 scored 90.2 points, both in the lower-middle range.

In the Smoke evaluation, which covers only 10 quick-test questions on the day, the interplay between execution and constraint strengths has clearly differentiated the scoring structures of various models. The Claude series holds a clear advantage in execution, while the remaining models need to find room for improvement in the execution dimension.

Data source: YZ Index (Winzheng) | Run #200 | View raw data

Claude Opus 4.7 Leads with 97.12 Points, Perfect Execution but Material Constraint Score of 93.6 Drags Down Overall

Perfect Execution with a Constraint Shortcoming

A Different Combination of Execution and Constraint in the Mid-tier Group

Ranking Changes Due to Declining Execution Scores

Constraint Dimension Relatively Stable

Related Reviews

Winzheng Index 4模型执行分暴跌至50，文心一言主榜狂掉34.1分

Winzheng Index Qwen3 Max Plunges 19.2 Points on Main Leaderboard; Four Models Score Perfect in Execution and Constraint

Winzheng Index Qwen3 Max Material Constraint Plummets 28.9 Points, Today's Smoke 11 Model Main Leaderboard Reshuffles

Winzheng Index Claude Opus 4.7 and GPT-5.5 Tie for First on Smoke Leaderboard; Material Constraint Becomes the Biggest Differentiator