Claude Opus 4.7 Material Constraint Plunges 16.5 Points, Main Ranking Drops from 96.83 to 90.78

In the June 2026 Smoke Evaluation of the YZ Index, Claude Opus 4.7's Material Constraint score dropped from 96.00 to 79.50, and its main ranking fell from 96.83 to 90.78.

Single-Day Data Comparison

Code Execution rose from 97.50 to 100.00, Material Constraint dropped from 96.00 to 79.50, Engineering Judgment dropped from 94.00 to 76.50, Task Expression rose from 92.50 to 97.50, and Integrity Rating remained pass.

Sampling Fluctuation or Model Degradation

The Smoke Evaluation only includes 10 questions per day, 2 per dimension, so the small sample size leads to a large standard deviation in single-day scores. Both Material Constraint and Engineering Judgment dropped by more than 16 points, exceeding typical sampling fluctuations. Continuous observation of the next three days' data is needed to determine whether this is a real capability degradation.

The dimensions of Code Execution and Task Expression improved instead, indicating that the model still maintains a high level in some tasks, with problems concentrated in scenarios requiring strict material citation.

Whether to Pay Special Attention

Currently, only one day of data is available, which is insufficient to conclude that Claude Opus 4.7 has experienced systemic degradation. However, the simultaneous sharp declines in Material Constraint and Engineering Judgment have already significantly impacted the main ranking. It is recommended to add this model to the daily tracking list.

If the Material Constraint score cannot recover above 90 points in the next three days, a preliminary judgment can be made that its stability on constraint tasks is problematic.

A single Smoke anomaly does not equal model collapse, but two consecutive days of low Material Constraint will directly shake Claude Opus 4.7's position on the main ranking.

Data source: YZ Index | Run #166 | View raw data