Doubao Pro Material Constraint Plunges 15.9 Points: Causes of Smoke Single-Day Test Anomaly

In the actual testing of 11 models in the YZ Index in June 2026, Doubao Pro's material constraint score in today's Smoke evaluation dropped from yesterday's 100.00 to 84.10, a decline of 15.9 points, causing its main ranking total score to fall from 100.00 to 92.85.

Score Change Breakdown

The code execution dimension remained unchanged at 100.00. The two side-ranking dimensions, engineering judgment and task expression, also stayed at 100.00. The integrity rating remained pass. The only decline was in material constraint, resulting in a loss of 7.2 points in the main ranking.

Topic Draw Fluctuation or Model Degradation

The Smoke evaluation only uses 2 questions per dimension per day, resulting in a very small sample size. A single-day fluctuation of 15.9 points in material constraint is consistent with the statistical characteristics of a 2-question test. The difference between yesterday's 100.00 and today's 84.10 may stem solely from the difficulty draw of different questions, rather than a systemic change in the model's capability itself.

To determine whether this is a true degradation, data from consecutive days on the same dimension is needed. With only a single day's record currently, random fluctuation cannot be ruled out.

Whether to Pay Extra Attention

A single-day drop of 15.9 points is within the normal range under the Smoke rapid testing framework. The other core dimensions of Doubao Pro remain unaffected, and the main ranking still holds a high score of 92.85. It is recommended to continue monitoring the same dimension's scores over the next 3-5 days; if material constraint consistently stays below 90, then initiate a deeper retest.

At present, there is no need to downgrade the conclusion on the model's overall capability.

A single Smoke fluctuation reveals testing granularity, not model degradation.

Data source: YZ Index | Run #187 | View raw data