Claude Sonnet 4.6 Material Constraint Plunges 15 Points, Smoke Quick Test Main Ranking Drops 6.8

May 27, 2026 532 Views - Read Source Winzheng Index

Claude Sonnet 4.6 Material Constraints Smoke快测 Model Degradation 评测波动

Claude Sonnet 4.6 showed significant anomalies in today's Smoke evaluation, with the material constraint dimension dropping directly from 74.50 to 59.50, a single-day decline of 15 points, and the main ranking dropping 6.8 points to 81.78. This magnitude exceeds the normal random fluctuation range of the daily 10-question quick test.

How to Distinguish Between Fluctuation and Degradation

The Smoke evaluation only has 2 questions per dimension per day, with a small sample size. A single question error can cause fluctuations of over 10 points. However, the material constraint has shown a systematic decline for two consecutive days, accompanied by an integrity rating change from pass to warn, indicating that the model's accuracy and boundary control when citing external materials have encountered substantive issues, rather than mere luck.

Recent Industry Dynamics as Supporting Evidence

Over the past three weeks, Anthropic has made at least two weight updates to the Claude 4 series, with a focus on optimizing long contexts and tool calling. Some developers have reported that Sonnet 4.6, when handling technical questions with citations, shows an increased proportion of "overconfident hallucinations," highly consistent with the decline in material constraint scores.

Should This Be a Priority Concern?

Yes. Material constraint is one of the two core dimensions of the YZ Index main ranking, directly affecting the model's usability in scenarios such as RAG and enterprise knowledge bases. Consecutive declines of -15 points, coupled with a yellow warning on the integrity rating, indicate that the current version of this model has entered an observation period. It is recommended that users postpone large-scale deployment in critical production tasks and await the next complete evaluation result.

A 15-point plunge is not noise, but a real alarm about Claude 4.6's material capability.

Data source: YZ Index | Run #134 | View raw data

Claude Sonnet 4.6 Material Constraint Plunges 15 Points, Smoke Quick Test Main Ranking Drops 6.8

How to Distinguish Between Fluctuation and Degradation

Recent Industry Dynamics as Supporting Evidence

Should This Be a Priority Concern?

Related Reviews

Winzheng Index GLM-4.6 Scores 25 in Material Constraint, 88.7 in Code Execution, Zero on Integrity Probe

Winzheng Index Claude Opus 4.7 Tops with 96.99: 2026-07-23 Smoke Quick Test Data Brief

Winzheng Index GLM-4.6 Integrity Rating Drops from Pass to Fail, Code Execution Surges by 47 Points

Winzheng Index Grok 4 Leads with 98.35 Points: 2026-07-22 Smoke Quick Test Data Brief