Strict Task Zeroed
Severity 8/10
2026-W20
GPT-o3 综合 Strict Question Zeroed
Score Comparison
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Overall (v5) | 60.4 | 60.1 | -0.3 |
| Code Execution (v5) | 82.9 | 80.5 | -2.4 |
| Knowledge Synthesis (v5) | 55.2 | 55.2 | +0 |
| Grounding (v5) | 72.1 | 75.2 | +3.1 |
| Value | 8.5 | 8.4 | -0.1 |
| Stability | 37.4 | 35.9 | -1.5 |
| Availability | 100.0 | 100.0 | +0 |
Affected Dimensions
execution
Top Lost Tasks 1
#1
SQL: Consecutive Login Days
100
0
Model Raw Response (excerpt)
```sql
WITH daily AS (
SELECT DISTINCT user_id, login_date
FROM user_logins
),
numbered AS (
SELECT
user_id,
login_date,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY login_date
) AS rn
FROM daily
),
streaks AS (
SELECT
user_id,
COUNT(*) AS streak_len
FROM numbered
GROUP BY
user_id,
login_date - rn * INTERVAL '1 day'
)
SELECT
user_id,
MAX(streak_len) AS max_streak
FROM
Run #112 · Formula v7 · Judge v6 · Benchmark v6 · 2026-05-11 04:21 SGT
View GPT-o3 Full Profile