Strict Task Zeroed
Severity 8/10
2026-W18
Claude Opus 4.6 综合严格题归零
Score Comparison
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Overall (v5) | 62.9 | 62.8 | -0.1 |
| Code Execution (v5) | 92.6 | 92.8 | +0.2 |
| Knowledge Synthesis (v5) | 54.3 | 50.1 | -4.2 |
| Grounding (v5) | 76.7 | 85.4 | +8.7 |
| Value | 5.1 | 5.1 | +0 |
| Stability | 39.5 | 35.2 | -4.3 |
| Availability | 100.0 | 100.0 | +0 |
Affected Dimensions
execution
Top Lost Tasks 1
#1
SQL: Suspected Duplicate Payment Identification
100
0
Model Raw Response (excerpt)
```sql
SELECT
p1.id AS first_id,
p2.id AS second_id,
p1.user_id,
p1.merchant_id,
p1.amount
FROM payments p1
JOIN payments p2
ON p1.user_id = p2.user_id
AND p1.merchant_id = p2.merchant_id
AND p1.amount = p2.amount
AND p1.id < p2.id
WHERE p1.status = 'paid'
AND p2.status = 'paid'
AND ABS(EXTRACT(EPOCH FROM (p2.created_at - p1.created_at))) <= 120
ORDER BY p1.id, p2.id;
```
Run #87 · Formula v7 · Judge v6 · Benchmark v6 · 2026-04-27 04:18 SGT
View Claude Opus 4.6 Full Profile