GPT-4o
Change Analysis · 2026 Week12
GPT-4o 2026 Week12 Code Execution (v5) dimension rose 29.2 pts
Score Comparison
41.2
39.2
-2
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Code Execution (v5) | 19.6 | 48.8 | +29.2 |
| Knowledge Synthesis (v5) | 35.4 | 33.4 | -2 |
| Grounding (v5) | 62.3 | 40.4 | -21.9 |
| Value | 18.6 | 19.4 | +0.8 |
| Stability | 52.8 | 32.2 | -20.6 |
| Availability | 100 | 65 | -35 |
All matched tasks had no score changes, or no tasks could be matched to the previous evaluation.
Back to Movers