Claude Opus 4.7 Tops with 94.82 Points, Gemini 3.1 Pro Plunges 32.2 Points

Jul 1, 2026 12 Views - Read Source Winzheng Index

Claude Opus Code Execution 模型排名执行约束失衡每日快测

In the Smoke lightweight evaluation on July 1, 2026, Claude Opus 4.7 ranked first on the main leaderboard with a score of 94.82, forming a balanced structure with a code execution score of 94.5 and a material constraint score of 95.2.

Top Three Exhibit Highly Aligned Execution and Constraint

Claude Opus 4.7 and Claude Sonnet 4.6 both scored 94.5 in code execution, with constraint scores of 95.2 and 94.8 respectively, resulting in a difference of only 0.18 points on the main leaderboard. DeepSeek V4 Pro also scored 94.5 in execution, but its constraint of 93 led to a main leaderboard score of 93.83, falling 0.81 points behind the second place.

GPT-5.5 scored 89.5 in execution and 91.2 in constraint, with a main leaderboard score of 90.27, indicating a structural characteristic where constraint slightly exceeds execution.

Clear Divergence: High Constraint, Low Execution

Grok 4 achieved a perfect constraint score of 100, but its execution was only 68.6, resulting in a main leaderboard score of 82.73. Gemini 2.5 Pro scored 97 in constraint and 64.5 in execution, with a main leaderboard score of 79.13. Qwen3 Max scored 96 in constraint and 64.5 in execution, with a main leaderboard score of 78.68.

豆包 Pro scored 95.2 in constraint and 44.5 in execution, with a main leaderboard score of 67.32. Gemini 3.1 Pro scored 94.8 in constraint and 43 in execution, with a main leaderboard score of 66.31. 文心一言 4.5 scored 95.2 in constraint and 41.7 in execution, with a main leaderboard score of 65.78.

Abnormal Fluctuations Compared to Yesterday

Gemini 3.1 Pro fell 32.2 points on the main leaderboard, with execution dropping 57 points. 豆包 Pro dropped 18.6 points on the main leaderboard, with execution declining 38.8 points. Grok 4 dropped 15.3 points on the main leaderboard, with execution falling 31.4 points.

Claude Sonnet 4.6 rose 12.1 points on the main leaderboard, with execution increasing 19.5 points. Claude Opus 4.7 rose 10.8 points on the main leaderboard, with execution increasing 21.7 points. Both Claude models consolidated their top-two positions through a rebound in execution scores.

Ranking Pressure from Structural Imbalance

When constraint scores approach or exceed 95 points, execution becomes the key variable determining main leaderboard rankings. Models with execution below 65 points, even with near-perfect constraint scores, can only remain in the sub-80 range.

文心一言 4.5 has an integrity rating of warn, while the remaining 10 models all have a rating of pass, indicating that most models maintain basic compliance in the material constraint dimension.

The combination ratio of execution and constraint, rather than a perfect score on a single dimension, determines the final ordering of the Smoke leaderboard.

Data source: YZ Index | Run #206 | View raw data

Claude Opus 4.7 Tops with 94.82 Points, Gemini 3.1 Pro Plunges 32.2 Points

Top Three Exhibit Highly Aligned Execution and Constraint

Clear Divergence: High Constraint, Low Execution

Abnormal Fluctuations Compared to Yesterday

Ranking Pressure from Structural Imbalance

Related Reviews

Winzheng Index GPT-5.5 Execution Score Plummets to 50; Gemini 3.1 Pro Main Score Drops 28.3 Points

Winzheng Index Doubao Pro Smoke Evaluation Main Ranking Plunges 18.6 Points, Code Execution Drops 38.8 in a Single Day

Winzheng Index Grok 4 Smoke Evaluation Main Score Plummets 15.3 Points, Code Execution Drops 31.4 in a Single Day

Winzheng Index Claude Sonnet 4.6 Smoke Main Ranking Plunges 15.3 Points, Code Execution Drops 25 Points in a Single Day