Skip to main content
Dimension Drop Severity 10/10 2026-W14

GPT-4o Code Execution (v5)下跌 23.7 分

GPT-4o Run #52

Score Comparison

Dimension Previous Current Change
Overall (v5) 81.1 49.3 -31.8
Code Execution (v5) 78.0 62.8 -15.2
Knowledge Synthesis (v5) 79.0 47.2 -31.8
Grounding (v5) 80.1 49.1 -31
Value 79.0 24.9 -54.1
Stability 80.0 27.8 -52.2
Availability 100.0 79.0 -21

Affected Dimensions

Code Execution (v5)
Run #52 · Formula v7 · Judge v6 · Benchmark v6 · 2026-03-30 04:16 SGT
View GPT-4o Full Profile