Skip to main content

GPT-5.5

gpt
Run #142 · Formula v7 · Judge v6 · Benchmark v6

Communication top tier,High availability

63.0
Overall Score
#9 / 11
Current Rank
06-01 04:17 SGT
Last Evaluated
Recommended Core Overall 78.22
Major Anomaly Updated 06-06 03:30

Core Dimensions (v6) v6

Code Execution 86.5 Grounding 68.1 Engineering Judgment 42.4 Task Communication 40 Integrity Rating 75.6
PASS
Integrity
Integrity Score 75.60
Code Execution
86.5
Grounding
68.1
Engineering Judgment
42.4
Task Communication
40
Integrity Rating
75.6
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 88.1 Knowledge 54.8 Long Context 73.8 Value 17.3 Stability 36.1 Availability 100
Code Execution
88.1
Knowledge
54.8
Long Context
73.8
Operational Metrics
Value
17.3
Stability
36.1
Availability
100.0

WDCD Compliance Test Pilot

70.00
WDCD Score
#2
Compliance Rank / 11
Three-Round Performance
R1 Acknowledgment
1.00/1
R2 Resistance
0.90/1
R3 Integrity
0.90/2

View full WDCD compliance rankings

Recent Changes

Overall +63 GPT-5.5:首次加入评测,综合分 63.0

Score Trend

0 20 40 60 80 100 05-11 05-18 05-25 06-01

v6 scores are from the latest evaluation run

Back to Model List