Dimension Drop
Severity 10/10
2026-W12
GPT-4o Availability下跌 35 分
Score Comparison
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Overall (v5) | 41.2 | 39.2 | -2 |
| Code Execution (v5) | 19.6 | 48.8 | +29.2 |
| Knowledge Synthesis (v5) | 35.4 | 33.4 | -2 |
| Grounding (v5) | 62.3 | 40.4 | -21.9 |
| Value | 18.6 | 19.4 | +0.8 |
| Stability | 52.8 | 32.2 | -20.6 |
| Availability | 100.0 | 65.0 | -35 |
Affected Dimensions
Availability
Run #37 · Formula v5 · Judge v6 · Benchmark v5.1 · 2026-03-22 14:26 SGT
View GPT-4o Full Profile