Qwen3 Max Material Constraint Plummets 28.9 Points, Today's Smoke 11 Model Main Leaderboard Reshuffles

Jun 17, 2026 17 Views - Read Source Winzheng Index

Qwen3 Max Material Constraints Smoke Light Test Code Execution 主榜排名

In the June 17, 2026 test of 11 models by YZ Index, Qwen3 Max's material constraint score plummeted from 100 points yesterday to 71.1 points, and its main leaderboard score was only 73.25 points, making it the most prominent anomaly of the day.

Structural Differences in Execution and Constraint Determine Rankings

Claude Opus 4.7 achieved a perfect 100 points in code execution and 100 points in material constraint, securing 100 points on the main leaderboard. The formula 0.55×100+0.45×100 gave it an uncontested lead. Gemini 2.5 Pro, Gemini 3.1 Pro, and GPT-5.5 all scored 98.83 points on the main leaderboard, with perfect execution scores of 100 points, but constraint scores of 97.4 points, showing a highly consistent structure.

GPT-o3, Claude Sonnet 4.6, and DeepSeek V4 Pro tied with 100 points in execution, with constraint scores of 94.8 points and 94 points respectively, placing their main leaderboard scores in the 97.66 to 97.3 point range. 豆包 Pro, however, showed a reverse structure: 91.7 points in execution and 100 points in constraint, with a main leaderboard score of 95.44 points, highlighting the weighting contribution of material constraint to the final score.

Yesterday's Comparison Reveals Signs of Execution Recovery

Gemini 2.5 Pro and Gemini 3.1 Pro each gained 53.8 points on the main leaderboard, with execution scores jumping from yesterday's unknown baseline directly to 100 points. GPT-5.5 gained 28.8 points on the main leaderboard, with execution rising to 100 points. DeepSeek V4 Pro gained 27.3 points on the main leaderboard, with execution also rising to 100 points. GPT-o3 gained 25.2 points on the main leaderboard, with execution rising to 100 points, but constraint dropped by 5.2 points.

These gains are primarily driven by perfect execution scores, indicating that some models had clear shortcomings in code execution tasks yesterday and have now completed recovery.

Anomalous Signals Point to Constraint Volatility

Qwen3 Max's material constraint score plummeted by 28.9 points, directly causing its main leaderboard score to drop from a possible high yesterday to 73.25 points. 文心一言 4.5 saw a sharp decline of 10.4 points on the main leaderboard, with an execution score of only 50 points and a constraint score of 97.4 points. The calculation 0.55×50+0.45×97.4 placed it at the bottom.

Grok 4 had an execution score of 66.7 points and a constraint score of 96.7 points, resulting in a main leaderboard score of 80.2 points, with its execution shortfall significantly dragging down the overall performance. These data indicate that a sudden decline in material constraint is harder to quickly recover from than execution volatility.

A perfect execution score of 100 has become standard for mainstream models, and the variance in constraint scores is emerging as the new differentiator.

Today's Smoke data once again validates: when execution scores converge, the stability of material constraint directly determines the final ranking on the main leaderboard.

Data source: YZ Index | Run #184 | View raw data

Qwen3 Max Material Constraint Plummets 28.9 Points, Today's Smoke 11 Model Main Leaderboard Reshuffles

Structural Differences in Execution and Constraint Determine Rankings

Yesterday's Comparison Reveals Signs of Execution Recovery

Anomalous Signals Point to Constraint Volatility

Related Reviews

Winzheng Index Claude Sonnet 4.6 Leads with 97.53 Points, Material Constraints Drag 文心一言 40 Points Behind

Winzheng Index Smoke Quick Test: 文心一言4.5 and Grok 4 Tie at 99.24, GPT-5.5's Execution Score Only 50

Winzheng Index Smoke Evaluation: Claude Sonnet 4.6 Leads with 99.78 Points, GPT Series Stuck at 74 Points

Winzheng Index Qwen3 Max Material Constraint Plunged 28.9 Points, but Main Leaderboard Rose Slightly by 0.8