YZ Index
Winzheng · Grounding Rankings
Long document citation verification and grounding accuracy.
| # | Model | Grounding | Code Execution | Overall |
|---|---|---|---|---|
| 🥇 | Grok 4 grok | 76.3 | 85 | |
| 🥈 | Claude Opus 4.7 claude | 84.3 | 89.3 | |
| 🥉 | Gemini 2.5 Pro gemini | 63.8 | 77.9 | |
| 4 | DeepSeek V4 Pro DeepSeek | 83.7 | 88.8 | |
| 5 | GPT-o3 gpt | 74 | 83.4 | |
| 6 | Gemini 3.1 Pro gemini | 72.9 | 82.5 | |
| 7 | GPT-5.5 gpt | 55.8 | 72.9 | |
| 8 | ERNIE Bot 4.5 ernie | 56.4 | 73.2 | |
| 9 | Qwen3 Max qwen | 71.2 | 81 | |
| 10 | Claude Sonnet 4.6 claude | 75.2 | 83 | |
| 11 | Doubao Pro doubao | 73.2 | 81.6 |