Skip to main content
Dimension Drop Severity 10/10 2026-W12

GPT-o3 Stability下跌 25 分

GPT-o3 Run #37

Score Comparison

Dimension Previous Current Change
Overall (v5) 39.0 34.5 -4.5
Code Execution (v5) 20.2 43.4 +23.2
Knowledge Synthesis (v5) 34.4 35.8 +1.4
Grounding (v5) 62.3 28.8 -33.5
Value 4.7 4.3 -0.4
Stability 53.0 28.0 -25
Availability 100.0 69.0 -31

Affected Dimensions

Stability
Run #37 · Formula v5 · Judge v6 · Benchmark v5.1 · 2026-03-22 14:26 SGT
View GPT-o3 Full Profile