YZ Index
Winzheng · Code Execution Rankings
Code sandbox execution pass rate. Real code, real results.
| # | Model | Code Execution | Grounding | Overall |
|---|---|---|---|---|
| 🥇 | Claude Opus 4.7 claude | 95.5 | 89.3 | |
| 🥈 | DeepSeek V4 Pro DeepSeek | 95 | 88.8 | |
| 🥉 | Grok 4 grok | 95.7 | 85 | |
| 4 | Claude Sonnet 4.6 claude | 92.5 | 83 | |
| 5 | GPT-o3 gpt | 94.9 | 83.4 | |
| 6 | Doubao Pro doubao | 91.9 | 81.6 | |
| 7 | Gemini 3.1 Pro gemini | 94.2 | 82.5 | |
| 8 | Qwen3 Max qwen | 92.9 | 81 | |
| 9 | Gemini 2.5 Pro gemini | 95.1 | 77.9 | |
| 10 | ERNIE Bot 4.5 ernie | 93.7 | 73.2 | |
| 11 | GPT-5.5 gpt | 93.8 | 72.9 |