| # | 模型 | 编程 | 知识 | 长上下文 | 性价比 | 稳定性 | 综合 |
|---|---|---|---|---|---|---|---|
| 🥇 | DeepSeek R1 DeepSeek | 87.8 | 93.3 | 75.6 | 99.6 | 77.8 | |
| 🥈 | Qwen Max Alibaba | 93.3 | 93.3 | 78.3 | 80.2 | 78.9 | |
| 🥉 | GPT-4o OpenAI | 87.8 | 93.3 | 86.7 | 61.3 | 80.7 | |
| 4 | DeepSeek V3 DeepSeek | 75.6 | 80.0 | 78.3 | 100.0 | 91.4 | |
| 5 | Claude Sonnet 4.6 Anthropic | 86.7 | 91.7 | 93.3 | 46.5 | 78.7 | |
| 6 | Claude Opus 4.6 Anthropic | 93.3 | 100.0 | 93.3 | 10.6 | 83.4 | |
| 7 | GPT-o3 OpenAI | 86.7 | 86.7 | 85.0 | 17.1 | 80.1 | |
| 8 | Gemini 2.5 Pro Google | 100.0 | 63.3 | 85.0 | 62.7 | 44.8 |