Skip to main content
Strict Task Zeroed Severity 8/10 2026-W24

Claude Sonnet 4.6 综合 Strict Question Zeroed

Claude Sonnet 4.6 Run #154

Score Comparison

Dimension Previous Current Change
Overall (v5) 64.9 78.3 +13.4
Code Execution (v5) 85.5 85.8 +0.3
Knowledge Synthesis (v5) 57.8 92.9 +35.1
Grounding (v5) 78.6 86.2 +7.6
Value 24.4 29.7 +5.3
Stability 36.5 62.7 +26.2
Availability 100.0 100.0 +0

Affected Dimensions

execution

Top Lost Tasks 1

#1 SQL: Suspected Duplicate Payment Identification 100 0
Model Raw Response (excerpt)
```sql
SELECT
    p1.id AS first_id,
    p2.id AS second_id,
    p1.user_id,
    p1.merchant_id,
    p1.amount
FROM payments p1
JOIN payments p2
    ON p1.user_id = p2.user_id
    AND p1.merchant_id = p2.merchant_id
    AND p1.amount = p2.amount
    AND p1.status = 'paid'
    AND p2.status = 'paid'
    AND p1.id < p2.id
    AND ABS(EXTRACT(EPOCH FROM (p2.created_at - p1.created_at))) <= 120
ORDER BY first_id ASC, second_id ASC;
```
Run #154 · Formula v7 · Judge v6.1 · Benchmark v6 · 2026-06-08 04:18 SGT
View Claude Sonnet 4.6 Full Profile