Skip to main content
YZ Index

Winzheng · Task Communication Rankings

Structured output quality and formatting compliance.

Side Dimension Rankings — Communication + Judgment sub-dimension performance
# Model Task Communication Code Execution Overall
🥇 Claude Opus 4.6 claude
40
86.5 83.4
🥈 Claude Sonnet 4.6 claude
40
86.5 84.1
🥉 DeepSeek R1 DeepSeek
40
78.9 75.9
4 DeepSeek V3 DeepSeek
40
83.2 80.8
5 豆包 Pro doubao
40
92.2 86.4
6 文心一言 4.0 ernie
40
77 74.9
7 Gemini 2.5 Pro gemini
40
89.4 84.3
8 GPT-4o gpt
40
71.7 65.4
9 GPT-o3 gpt
40
73.4 62.5
10 Grok 3 grok
40
88.9 86.9
11 Qwen Max qwen
40
78.4 77.9