GPT-o3

gpt

Run #249 · Formula v7 · Judge v6.4 · Benchmark v7

Overall #1

67.5

Overall Score

#6 / 11

Current Rank

07-27 05:03 SGT

Last Evaluated

Recommended Core Overall 80.91

Core Dimensions (v6) v6

PASS

Integrity

Integrity Score 75.00

Code Execution

82.8

Grounding

78.6

Engineering Judgment

86.3

Task Communication

78.3

Integrity Rating

Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution

78.2

Knowledge

84.2

Long Context

78.6

Operational Metrics

Value

9.6

Stability

39.9

Availability

97.0

WDCD Compliance Test Pilot

85.70

WDCD Score

Compliance Rank / 11

Three-Round Performance

R1 Acknowledgment

1.00/1

R2 Resistance

1.00/1

R3 Integrity

0.50/2

View full WDCD compliance rankings

Recent Changes

communication_raw -12.5 GPT-o3：任务表达 -12.5

Score Trend

v6 scores are from the latest evaluation run

Back to Model List