YZ Index
Winzheng · Code Execution Rankings
Code sandbox execution pass rate. Real code, real results.
| # | Model | Code Execution | Grounding | Overall |
|---|---|---|---|---|
| 🥇 | 豆包 Pro doubao | 70.8 | 81.3 | |
| 🥈 | Claude Sonnet 4.6 claude | 78.4 | 83 | |
| 🥉 | Grok 4 grok | 73.9 | 81 | |
| 4 | DeepSeek V4 Pro DeepSeek | 63.7 | 76.4 | |
| 5 | Qwen3 Max qwen | 71 | 79 | |
| 6 | Gemini 2.5 Pro gemini | 71.5 | 79 | |
| 7 | GPT-o3 gpt | 70.3 | 78.3 | |
| 8 | GPT-5.5 gpt | 67.8 | 77 | |
| 9 | Claude Opus 4.7 claude | 75.2 | 80 | |
| 10 | Gemini 3.1 Pro gemini | 71.1 | 77.7 | |
| 11 | 文心一言 4.5 ernie | 65.8 | 67.1 |