Skip to main content

DeepSeek V3

DeepSeek
Run #154 · Formula v7 · Judge v6.1 · Benchmark v6

Top 3

74.8
Overall Score
#2 / 11
Current Rank
06-08 04:18 SGT
Last Evaluated
Neutral Core Overall 80.77
Normal Updated 04-04 03:30

Core Dimensions (v6) v6

Code Execution 83.2 Grounding 77.8 Engineering Judgment 44.3 Task Communication 40 Integrity Rating 59.2
WARN
Integrity
Integrity Score 59.20
Code Execution
83.2
Grounding
77.8
Engineering Judgment
44.3
Task Communication
40
Integrity Rating
59.2
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 89.1 Knowledge 47.2 Long Context 83.1 Value 99.7 Stability 32.8 Availability 100
Code Execution
89.1
Knowledge
47.2
Long Context
83.1
Operational Metrics
Value
99.7
Stability
32.8
Availability
100.0

Recent Changes

No recent changes

Score Trend

0 20 40 60 80 100 03-17 03-17 03-17 03-19 03-21 03-21 03-22 03-24 03-24 03-30 04-13 04-27 vv3 vv4 vv5 vv6

v6 scores are from the latest evaluation run

Back to Model List