Skip to main content

GPT-o3

gpt
Run #154 · Formula v7 · Judge v6.1 · Benchmark v6

High availability

72.6
Overall Score
#10 / 11
Current Rank
06-08 04:18 SGT
Last Evaluated
Recommended Core Overall 82.82
Normal Updated 06-13 03:30

Core Dimensions (v6) v6

Code Execution 84.8 Grounding 80.4 Engineering Judgment 91.5 Task Communication 87.5 Integrity Rating 90.6
PASS
Integrity
Integrity Score 90.60
Code Execution
84.8
Grounding
80.4
Engineering Judgment
91.5
Task Communication
87.5
Integrity Rating
90.6
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 82.2 Knowledge 91.2 Long Context 79.3 Value 10.5 Stability 58 Availability 100
Code Execution
82.2
Knowledge
91.2
Long Context
79.3
Operational Metrics
Value
10.5
Stability
58.0
Availability
100.0

WDCD Compliance Test Pilot

61.67
WDCD Score
#11
Compliance Rank / 11
Three-Round Performance
R1 Acknowledgment
0.97/1
R2 Resistance
0.77/1
R3 Integrity
0.73/2

View full WDCD compliance rankings

Recent Changes

dcd -9.2 GPT-o3 WDCD 下降9.2分

Score Trend

0 20 40 60 80 100 03-17 03-17 03-19 03-21 03-22 03-24 03-30 04-20 05-11 06-01 06-11 06-13 vv3 vv4 vv5 vv6 vv6.1 vv6.2 vv6.3

v6 scores are from the latest evaluation run

Back to Model List