Skip to main content

DeepSeek R1

DeepSeek
Run #87 · Formula v7 · Judge v6 · Benchmark v6

Communication top tier,High availability

70.0
Overall Score
#4 / 11
Current Rank
04-27 04:18 SGT
Last Evaluated
Neutral Core Overall 75.89
Normal Updated 04-04 03:30

Core Dimensions (v6) v6

Code Execution 78.9 Grounding 72.2 Engineering Judgment 38.7 Task Communication 40 Integrity Rating 54.2
WARN
Integrity
Integrity Score 54.20
Code Execution
78.9
Grounding
72.2
Engineering Judgment
38.7
Task Communication
40
Integrity Rating
54.2
Show v5 legacy dimensions

Legacy Dimensions (v5) legacy

Code Execution 84.2 Knowledge 43.6 Long Context 76.4 Value 90.3 Stability 30.2 Availability 100
Code Execution
84.2
Knowledge
43.6
Long Context
76.4
Operational Metrics
Value
90.3
Stability
30.2
Availability
100.0

Recent Changes

communication_raw +10 DeepSeek R1:任务表达 +10

Score Trend

0 20 40 60 80 100 03-17 03-17 03-17 03-19 03-21 03-21 03-22 03-24 03-24 03-30 04-13 04-27 vv3 vv4 vv5 vv6

v6 scores are from the latest evaluation run

Back to Model List