Skip to main content
YZ Index

Winzheng · Grounding Rankings

Long document citation verification and grounding accuracy.

# Model Grounding Code Execution Overall
🥇 Grok 4 grok
95.7
76.3 85
🥈 Claude Opus 4.7 claude
95.5
84.3 89.3
🥉 Gemini 2.5 Pro gemini
95.1
63.8 77.9
4 DeepSeek V4 Pro DeepSeek
95
83.7 88.8
5 GPT-o3 gpt
94.9
74 83.4
6 Gemini 3.1 Pro gemini
94.2
72.9 82.5
7 GPT-5.5 gpt
93.8
55.8 72.9
8 ERNIE Bot 4.5 ernie
93.7
56.4 73.2
9 Qwen3 Max qwen
92.9
71.2 81
10 Claude Sonnet 4.6 claude
92.5
75.2 83
11 Doubao Pro doubao
91.9
73.2 81.6