严格题归零
严重度 8/10
2026-W24
Claude Sonnet 4.6 综合 严格题归零
分数对比
| 维度 | 上期 | 本期 | 变化 |
|---|---|---|---|
| 主榜 (v5) | 64.9 | 78.3 | +13.4 |
| 代码执行 (v5) | 85.5 | 85.8 | +0.3 |
| 知识综合 (v5) | 57.8 | 92.9 | +35.1 |
| 材料约束 (v5) | 78.6 | 86.2 | +7.6 |
| 性价比 | 24.4 | 29.7 | +5.3 |
| 稳定性 | 36.5 | 62.7 | +26.2 |
| 可用性 | 100.0 | 100.0 | +0 |
受影响维度
execution
丢分题目 Top 1
#1
SQL:疑似重复支付识别
100
0
模型原始回复(截取)
```sql
SELECT
p1.id AS first_id,
p2.id AS second_id,
p1.user_id,
p1.merchant_id,
p1.amount
FROM payments p1
JOIN payments p2
ON p1.user_id = p2.user_id
AND p1.merchant_id = p2.merchant_id
AND p1.amount = p2.amount
AND p1.status = 'paid'
AND p2.status = 'paid'
AND p1.id < p2.id
AND ABS(EXTRACT(EPOCH FROM (p2.created_at - p1.created_at))) <= 120
ORDER BY first_id ASC, second_id ASC;
```
Run #154 · 公式 v7 · 判分 v6.1 · 题库 v6 · 2026-06-08 04:18 SGT
查看 Claude Sonnet 4.6 完整档案