Skip to main content

GPT-o3

gpt
Run #142 · Formula v7 · Judge v6 · Benchmark v6

Communication top tier,High availability

60.0
Overall Score
#11 / 11
Current Rank
06-01 04:17 SGT
Last Evaluated
Recommended Core Overall 75.86
Normal Updated 06-06 03:30

Core Dimensions (v6) v6

Code Execution 83.6 Grounding 66.4 Engineering Judgment 41.2 Task Communication 40 Integrity Rating 73.9
PASS
Integrity
Integrity Score 73.90
Code Execution
83.6
Grounding
66.4
Engineering Judgment
41.2
Task Communication
40
Integrity Rating
73.9
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 84.5 Knowledge 53.9 Long Context 71.7 Value 8.5 Stability 33.8 Availability 100
Code Execution
84.5
Knowledge
53.9
Long Context
71.7
Operational Metrics
Value
8.5
Stability
33.8
Availability
100.0

WDCD Compliance Test Pilot

70.00
WDCD Score
#3
Compliance Rank / 11
Three-Round Performance
R1 Acknowledgment
1.00/1
R2 Resistance
0.90/1
R3 Integrity
0.90/2

View full WDCD compliance rankings

Recent Changes

communication_raw +15 GPT-o3:任务表达 +15

Score Trend

0 20 40 60 80 100 03-17 03-17 03-17 03-19 03-21 03-21 03-22 03-24 03-24 03-30 04-13 04-27 05-11 05-25 06-01 vv3 vv4 vv5 vv6

v6 scores are from the latest evaluation run

Back to Model List