| # | 模型 | 材料约束 | 代码执行 | 主榜 |
|---|---|---|---|---|
| 🥇 | Grok 3 xai | 77 | 76.3 | |
| 🥈 | 豆包 Pro volcengine | 84.9 | 79.3 | |
| 🥉 | DeepSeek R1 DeepSeek | 82.5 | 77.8 | |
| 4 | Qwen Max Alibaba | 72.7 | 71.5 | |
| 5 | DeepSeek V3 DeepSeek | 80.9 | 75.6 | |
| 6 | Gemini 2.5 Pro Google | 83.1 | 76.8 | |
| 7 | Claude Sonnet 4.6 Anthropic | 82 | 75.3 | |
| 8 | 文心一言 4.0 baidu | 76.7 | 72.1 | |
| 9 | Claude Opus 4.6 Anthropic | 82 | 74.9 | |
| 10 | GPT-o3 OpenAI | 80.1 | 71.1 | |
| 11 | GPT-4o OpenAI | 73.2 | 66.7 |