Claude Sonnet 4.6
Change Analysis · 2026 Week12
Claude Sonnet 4.6 2026 Week12 Code Execution (v5) dimension rose 38.3 pts
Score Comparison
42.0
53.0
+11
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Code Execution (v5) | 20.8 | 59.1 | +38.3 |
| Knowledge Synthesis (v5) | 37.4 | 43.1 | +5.7 |
| Grounding (v5) | 66.7 | 76.2 | +9.5 |
| Value | 13.8 | 19.6 | +5.8 |
| Stability | 54.2 | 31.2 | -23 |
| Availability | 100 | 100 | 0 |
All matched tasks had no score changes, or no tasks could be matched to the previous evaluation.
Back to Movers