Skip to main content
YZ Index

Winzheng · Task Communication Rankings

Structured output quality and formatting compliance.

Side Dimension Rankings — Communication + Judgment sub-dimension performance
# Model Task Communication Code Execution Overall
🥇 Claude Opus 4.7 claude
89.4
90.3 89
🥈 Claude Sonnet 4.6 claude
87.8
87.6 87.2
🥉 Grok 4 grok
87.8
93.9 89.9
4 GPT-o3 gpt
87.5
84.8 82.8
5 GPT-5.5 gpt
87.4
81.9 80.9
6 Qwen3 Max qwen
85.3
89.7 86.2
7 DeepSeek V4 Pro DeepSeek
85.1
87.9 83.3
8 Gemini 3.1 Pro gemini
84.9
88.4 84.8
9 Gemini 2.5 Pro gemini
84.6
88.1 86.4
10 Doubao Pro doubao
84.1
94.6 88.8
11 ERNIE Bot 4.5 ernie
72
78 76.9