Material Constraints Plunge 20 Points Collectively, Claude Opus 4.7 Holds First with 90.78 Points

Jun 13, 2026 378 Views - Read Source Winzheng Index

Claude Opus 4.7 Material Constraints GPT-5.5 Smoke Test 异常信号

In the YZ Index Smoke Lite evaluation on June 13, 2026, Claude Opus 4.7 ranked first on the main leaderboard with 90.78 points, achieving 100 points in code execution and 79.5 points in material constraints.

Full Marks in Execution Common, Constraints Become Sole Dividing Line

Today, all top 10 models achieved full marks in code execution. The core_overall score formula of 0.55×Execution + 0.45×Constraints makes material constraints the only variable determining ranking. Claude Opus 4.7 scored 79.5 in constraints, Doubao Pro 78.5, and Gemini 2.5 Pro 77.3, with each 0.45-point gap directly corresponding to leads of 0.45, 0.45, and 0.23 points on the main leaderboard.

ERNIE Bot 4.5 is the only model not achieving full marks in execution, scoring 50 in execution and 76.8 in constraints, with only 62.06 points on the main leaderboard—28.27 points behind the second place. This demonstrates that once execution falters, even decent constraint performance cannot secure a top-tier position.

Material Constraints Plunge Collectively, Anomalous Signals Concentrate

Compared to yesterday, eight models saw double-digit declines in material constraints. GPT-5.5 constraints plummeted 20.3 points to 66, dropping to sixth on the main leaderboard; Qwen3 Max constraints plunged 30.3 points to 64.5; and Gemini 3.1 Pro constraints dropped 34 points, causing its main leaderboard score to fall 13.9 points to 83.04. These declines far exceed fluctuations in execution, indicating that today's test materials posed significantly higher demands on the constraint dimension.

Doubao Pro's main leaderboard score rose 23.9 points, primarily driven by a 47.5-point recovery in execution from yesterday's low, while constraints only fell 5 points, still landing it in second place. Gemini 2.5 Pro's execution rebounded 45 points, constraints fell 15.2 points, netting a gain of 17.9 points, showing that improvements in execution can partially offset constraint losses.

Structural Characteristics and Stability Concerns

The current landscape shows that code execution has entered a plateau, while material constraints have become a high-frequency volatile item. Although Claude Opus 4.7 also saw its constraint score drop 16.5 points, it still holds first place at 79.5 points, indicating a higher baseline for its constraint performance. With GPT-5.5 at 66 in constraints and a "warn" integrity rating, it faces greater risk exposure in an environment where multiple models' constraints are declining simultaneously.

ERNIE Bot 4.5's execution score of 50 creates a gap with other models, exposing its persistent weakness in code execution tasks—not merely a one-day fluctuation.

The sharp fluctuations in material constraints are exposing the true upper limits of models. Full marks in execution are only an entry ticket; constraint stability is the final ticket.

Data Source: YZ Index (Winzheng Index) | Run #166 | View Raw Data

Material Constraints Plunge 20 Points Collectively, Claude Opus 4.7 Holds First with 90.78 Points

Full Marks in Execution Common, Constraints Become Sole Dividing Line

Material Constraints Plunge Collectively, Anomalous Signals Concentrate

Structural Characteristics and Stability Concerns

Related Reviews

Winzheng Index Claude Opus 4.7 Smoke Evaluation Main Ranking Drops 26.1 Points, Code Execution and Material Constraints Both Fail

Winzheng Index Claude Opus 4.7 drops 14 points on main leaderboard, Code Execution falls from 100 to 69

Winzheng Index DeepSeek V4 Pro Material Constraint Plunges 31.8 Points While Code Execution Jumps from 69.5 to 100

Winzheng Index GPT-o3 Code Execution Surges 52.5 Points, Material Constraint Drops 15.7 Points, Main Leaderboard Rises 21.8 Points