YZ Index
评测数据
当前展示:Run #69 | 2026-04-13 | 212 题库 | 公式 v7 | 判分 v6
数据公开说明:为防止题库污染和过拟合,题目原文和预期答案不公开。本页展示模型回答、得分、判分方式等透明数据。完整方法论请参阅方法论页面。
| 模型 | 代码执行 | 材料约束 | 工程判断 | 任务表达 | 诚信 | 主榜分 | 性价比 | 稳定性 | 可用性 | 逐题 |
|---|---|---|---|---|---|---|---|---|---|---|
| 豆包 Pro volcengine | 84.90 | 72.40 | 48.00 | 40.00 | 76.70 pass | 79.28 | 92.6 | 38.9 | 100.0 | |
| DeepSeek R1 deepseek | 82.50 | 72.10 | 43.60 | 40.00 | 68.30 pass | 77.82 | 92.2 | 34.3 | 100.0 | |
| Gemini 2.5 Pro google | 83.10 | 69.00 | 42.30 | 40.00 | 83.30 pass | 76.76 | 37.0 | 36.7 | 100.0 | |
| Grok 3 xai | 77.00 | 75.40 | 45.20 | 40.00 | 70.00 pass | 76.28 | 23.6 | 35.0 | 98.0 | |
| DeepSeek V3 deepseek | 80.90 | 69.10 | 42.30 | 40.00 | 68.30 pass | 75.59 | 99.6 | 32.4 | 100.0 | |
| Claude Sonnet 4.6 anthropic | 82.00 | 67.00 | 42.30 | 40.00 | 73.30 pass | 75.25 | 23.2 | 36.6 | 100.0 | |
| Claude Opus 4.6 anthropic | 82.00 | 66.30 | 45.20 | 40.00 | 76.70 pass | 74.94 | 4.7 | 36.8 | 100.0 | |
| 文心一言 4.0 baidu | 76.70 | 66.50 | 40.00 | 35.00 | 56.70 warn | 72.11 | 98.4 | 30.4 | 100.0 | |
| Qwen Max alibaba | 72.70 | 70.00 | 38.30 | 35.00 | 76.70 pass | 71.49 | 46.9 | 29.6 | 100.0 | |
| GPT-o3 openai | 80.10 | 60.10 | 42.30 | 35.00 | 73.30 pass | 71.10 | 8.0 | 32.3 | 97.0 | |
| GPT-4o openai | 73.20 | 58.80 | 42.30 | 35.00 | 76.70 pass | 66.72 | 29.9 | 30.9 | 95.0 |
API 访问:如需程序化访问评测数据,请使用我们的 API。