Claude Sonnet 4.6 Leads with 97.53 Points, Material Constraints Drag ERNIE Bot 40 Points Behind

Jun 10, 2026 447 Views - Read Source Winzheng Index

Claude Sonnet 4.6 Material Constraints Smoke Light Test 主榜排名执行满分

Smoke's quick test today directly concludes that code execution has become the passing line, while material constraints are the true dividing line.

Top Three Separated by Only 1.58 Points, Claude Wins Two in a Row

Claude Sonnet 4.6 ranks first with 97.53 points, followed by Opus 4.7 at 96.54, and Grok 4 at 95.95. All three scored 100 in code execution, and the real gap comes from material constraints: Sonnet 94.5, Opus 92.3, Grok 91. The weight of 0.45 directly determines their main leaderboard rankings.

Perfect Execution Scores Become the Norm, ERNIE Bot is the Only Exception

Among 11 models, 10 achieved 100 points in code execution. The only failure is ERNIE Bot 4.5, with only 50 points. This directly brings its main leaderboard score down to 53.83, nearly 44 points lower than second place. The execution dimension is no longer a weakness for most models; material constraints have instead become the decisive variable.

Material Constraint Score Gap Exceeds 33 Points, Chinese Models Under Collective Pressure

Material constraint scores range from a high of 94.5 to a low of 58.5, a range of 36 points. GPT-5.5, Doubao Pro, and Gemini 系列 all hover between 75 and 79.5, while Qwen3 Max scores only 61. Models with insufficient constraint ability will consistently lose points on tasks that require strict citation of the original text and avoiding hallucinations, which is also the main reason for the clustering in the lower half of today's rankings.

Today's data once again confirms a trend: when execution capability is universally met, the real difference between models is concentrated on their fidelity to the input material. Claude Sonnet 4.6's lead in this dimension has translated into a top-ranking advantage for two consecutive days.

For every 10-point improvement in material constraints, the main leaderboard score gains 4.5 points. ERNIE Bot paid the most expensive lesson with 50 points in execution and 58.5 points in constraints.

Data Source: YZ Index | Run #156 | View Raw Data

Claude Sonnet 4.6 Leads with 97.53 Points, Material Constraints Drag ERNIE Bot 40 Points Behind

Top Three Separated by Only 1.58 Points, Claude Wins Two in a Row

Perfect Execution Scores Become the Norm, ERNIE Bot is the Only Exception

Material Constraint Score Gap Exceeds 33 Points, Chinese Models Under Collective Pressure

Related Reviews

Winzheng Index Gemini 2.5 Pro Code Execution Dropped 24.6 Points in a Single Day; Overall Ranking Slid 6.5 Points

Winzheng Index Gemini 3.1 Pro Material Constraint Drops 26.6 Points, Main Ranking Still Up 5.4 Points

Winzheng Index GLM-4.6: 93.30 on Material Constraint but Integrity Fail, Code Execution 25.00 Drags Down Leaderboard

Winzheng Index GLM-4.6 Integrity Rating Drops from Pass to Fail, Code Execution Surges by 47 Points