Skip to main content

Grok 3

Change Analysis · 2026 Week12

Grok 3 2026 Week12 Code Execution (v5) dimension rose 42.4 pts

Score Comparison

42.5 56.2 +13.7
Dimension Previous Current Change
Code Execution (v5) 22.5 64.9 +42.4
Knowledge Synthesis (v5) 38.8 44.8 +6
Grounding (v5) 64.5 83 +18.5
Value 13.9 21.1 +7.2
Stability 54.2 31.7 -22.5
Availability 100 100 0

All matched tasks had no score changes, or no tasks could be matched to the previous evaluation.

Back to Movers