Skip to main content

GPT-o3

Change Analysis · 2026 Week14

GPT-o3 2026 Week14 Code Execution (v5) dimension dropped 15.3 pts

Score Comparison

55.0 50.6 -4.4
Dimension Previous Current Change
Code Execution (v5) 84.7 69.4 -15.3
Knowledge Synthesis (v5) 47.2 51.2 +4
Grounding (v5) 56.9 53.2 -3.7
Value 7.7 6.9 -0.8
Stability 29 31.7 +2.7
Availability 93.9 83 -10.9

All matched tasks had no score changes, or no tasks could be matched to the previous evaluation.

Back to Movers