Skip to main content
YZ Index

Winzheng · Grounding Rankings

Long document citation verification and grounding accuracy.

# Model Grounding Code Execution Overall
🥇 Claude Sonnet 4.6 claude
78.4
86.8 83
🥈 Claude Opus 4.7 claude
75.2
83.9 80
🥉 Grok 4 grok
73.9
86.8 81
4 Gemini 2.5 Pro gemini
71.5
85.2 79
5 Gemini 3.1 Pro gemini
71.1
83 77.7
6 Qwen3 Max qwen
71
85.5 79
7 豆包 Pro doubao
70.8
89.8 81.3
8 GPT-o3 gpt
70.3
84.8 78.3
9 GPT-5.5 gpt
67.8
84.6 77
10 文心一言 4.5 ernie
65.8
68.2 67.1
11 DeepSeek V4 Pro DeepSeek
63.7
86.7 76.4