Skip to main content

GPT-4o

gpt
Run #87 · Formula v7 · Judge v6 · Benchmark v6

Communication top tier

57.2
Overall Score
#10 / 11
Current Rank
04-27 04:18 SGT
Last Evaluated
Recommended Core Overall 65.36
Normal Updated 04-04 03:30

Core Dimensions (v6) v6

Code Execution 71.7 Grounding 57.6 Engineering Judgment 41.5 Task Communication 40 Integrity Rating 74.2
PASS
Integrity
Integrity Score 74.20
Code Execution
71.7
Grounding
57.6
Engineering Judgment
41.5
Task Communication
40
Integrity Rating
74.2
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 79.4 Knowledge 46.9 Long Context 61 Value 29.1 Stability 30.4 Availability 91
Code Execution
79.4
Knowledge
46.9
Long Context
61.0
Operational Metrics
Value
29.1
Stability
30.4
Availability
91.0

Recent Changes

communication_raw +15 GPT-4o:任务表达 +15

Score Trend

0 20 40 60 80 100 03-17 03-17 03-17 03-19 03-21 03-21 03-22 03-24 03-24 03-30 04-13 04-27 vv3 vv4 vv5 vv6

v6 scores are from the latest evaluation run

Back to Model List