Skip to main content

DeepSeek V3

DeepSeek
Run #87 · Formula v7 · Judge v6 · Benchmark v6

Communication top tier,Best value,High availability

74.8
Overall Score
#2 / 11
Current Rank
04-27 04:18 SGT
Last Evaluated
Neutral Core Overall 80.77
Normal Updated 04-04 03:30

Core Dimensions (v6) v6

Code Execution 83.2 Grounding 77.8 Engineering Judgment 44.3 Task Communication 40 Integrity Rating 59.2
WARN
Integrity
Integrity Score 59.20
Code Execution
83.2
Grounding
77.8
Engineering Judgment
44.3
Task Communication
40
Integrity Rating
59.2
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 89.1 Knowledge 47.2 Long Context 83.1 Value 99.7 Stability 32.8 Availability 100
Code Execution
89.1
Knowledge
47.2
Long Context
83.1
Operational Metrics
Value
99.7
Stability
32.8
Availability
100.0

Recent Changes

communication_raw +10 DeepSeek V3:任务表达 +10

Score Trend

0 20 40 60 80 100 03-17 03-17 03-17 03-19 03-21 03-21 03-22 03-24 03-24 03-30 04-13 04-27 vv3 vv4 vv5 vv6

v6 scores are from the latest evaluation run

Back to Model List