Skip to main content

Grok 4

grok
Run #180 · Formula v7 · Judge v6.3 · Benchmark v7

High availability

75.7
Overall Score
#7 / 11
Current Rank
06-15 09:25 SGT
Last Evaluated
Recommended Core Overall 88.02
Normal Updated 06-19 03:30

Core Dimensions (v6) v6

Code Execution 81.4 Grounding 96.1 Engineering Judgment 88.3 Task Communication 94.9 Integrity Rating 83.3
PASS
Integrity
Integrity Score 83.30
Code Execution
81.4
Grounding
96.1
Engineering Judgment
88.3
Task Communication
94.9
Integrity Rating
83.3
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 80.1 Knowledge 88.7 Long Context 96.1 Value 28.6 Stability 48.2 Availability 100
Code Execution
80.1
Knowledge
88.7
Long Context
96.1
Operational Metrics
Value
28.6
Stability
48.2
Availability
100.0

WDCD Compliance Test Pilot

82.50
WDCD Score
#6
Compliance Rank / 11
Three-Round Performance
R1 Acknowledgment
1.00/1
R2 Resistance
0.80/1
R3 Integrity
1.50/2

View full WDCD compliance rankings

Recent Changes

dcd +7.8 Grok 4 WDCD 上升7.8分

Score Trend

Not enough data for trend chart (need 3+ runs)
Back to Model List