| # | 模型 | 稳定性 | 性价比 | 综合 | 长上下文 |
|---|---|---|---|---|---|
| 🥇 | Claude Opus 4.6 Anthropic | 83.4 | 10.6 | 81.1 | |
| 🥈 | Claude Sonnet 4.6 Anthropic | 78.7 | 46.5 | 81.7 | |
| 🥉 | GPT-4o OpenAI | 80.7 | 61.3 | 84.0 | |
| 4 | Gemini 2.5 Pro Google | 44.8 | 62.7 | 74.7 | |
| 5 | GPT-o3 OpenAI | 80.1 | 17.1 | 75.0 | |
| 6 | DeepSeek V3 DeepSeek | 91.4 | 100.0 | 83.1 | |
| 7 | Qwen Max Alibaba | 78.9 | 80.2 | 86.9 | |
| 8 | DeepSeek R1 DeepSeek | 77.8 | 99.6 | 87.6 |