Qwen3 Max Main Board Plunges 12.9 Points, Gemini 2.5 Pro Leads Smoke Lite List with 96.99 Points

Jul 4, 2026 10 Views - Read Source Winzheng Index

Gemini 2.5 Pro Qwen3 Max Smoke Test Code Execution Material Constraints

In the Smoke Lite evaluation of 11 models on July 4, 2026, by the YZ Index, Gemini 2.5 Pro ranked first with a Main Board score of 96.99 (Code Execution 100, Material Constraints 93.3), while Qwen3 Max's Main Board score plunged 12.9 points to 72.02.

Structural Differentiation of Execution and Constraints

The common feature of today's top three — Gemini 2.5 Pro, Grok 4, and Claude Opus 4.7 — is that all have Code Execution scores above 97, while their Material Constraints are all locked at 93.3. The combination of a perfect Execution score and a Constraint score of 93.3 gives Gemini 2.5 Pro a core_overall formula score (0.55×100+0.45×93.3) of 96.99. Grok 4's structure of Execution 99.2 and Constraint 93.3 is only 0.44 points lower than Gemini, indicating that the two have formed a parallel advantage in Material Constraints.

DeepSeek V4 Pro's Execution 80.3 and Constraint 80.1 are the closest, showing a balanced structure but low absolute scores, resulting in a Main Board score of only 80.21. Qwen3 Max's combination of Execution 69.5 and Constraint 75.1 is in the mid-to-lower range of the list, and after a 12.9-point plunge, it further widens the gap with the top five.

Dimensional Contributions Behind Single-Day Gains

GPT-o3's Main Board rose by 24 points, with Execution contributing 24.5 points and Constraints contributing 23.4 points, showing synchronized improvement in both dimensions. In Gemini 2.5 Pro's 22.4-point rise, Execution increased by 25.7 points, greater than the 18.3-point increase in Constraints, indicating a more significant improvement in its Execution capability today. DeepSeek V4 Pro's Execution rose by 30.3 points in a single day, while Constraints only rose by 10.1 points; the improvement in Execution was the main source of its 21.2-point Main Board increase.

Grok 4's Constraint gain of 30 points far exceeded its Execution gain of 7.1 points; the rapid rebound in Material Constraints drove its Main Board up by 17.4 points. Doubao Pro's Execution rose by 22 points while Constraints only rose by 6.6 points, indicating a structure more dependent on Execution-side drivers.

Anomalous Signals and Possible Causes

Qwen3 Max's 12.9-point plunge in Main Board score is the only obvious negative anomaly today, with declines in both Execution and Constraints. GLM-4.6's all-dimensional zeroing in the list may be due to no valid results being returned in that day's evaluation. Claude Sonnet 4.6 scored 97 in Execution but only 60.1 in Constraints; the huge gap between Execution and Constraints kept its Main Board score at 80.4 points.

Execution scores being generally higher than Constraint scores is the common pattern among the 11 models today; except for DeepSeek V4 Pro, all other models have Execution at least 10 points higher than Constraints. Gemini 2.5 Pro and Grok 4 are tied for the lead in Constraints, which may be the decisive factor in today's rankings.

The combination of a perfect Execution score and a tie for first in Constraints has become the standard for the top tier of the Smoke Lite List.

Data source: YZ Index | Run #213 | View raw data

Qwen3 Max Main Board Plunges 12.9 Points, Gemini 2.5 Pro Leads Smoke Lite List with 96.99 Points

Structural Differentiation of Execution and Constraints

Dimensional Contributions Behind Single-Day Gains

Anomalous Signals and Possible Causes

Related Reviews

Winzheng Index Qwen3 Max Main Leaderboard Plummets 12.9 Points, Code Execution Drops 26.8 in a Single Day

Winzheng Index Qwen3 Max Code Execution Plunges 50 Points, Main Ranking Only Drops 1.5 Points

Winzheng Index 4模型执行分暴跌至50，文心一言主榜狂掉34.1分

Winzheng Index Qwen3 Max Smoke Evaluation Main Score Plummets 12 Points, Integrity Rating Changes from Pass to Fail