Skip to main content
Strict Task Zeroed Severity 8/10 2026-W20

GPT-o3 综合 Strict Question Zeroed

GPT-o3 Run #112

Score Comparison

Dimension Previous Current Change
Overall (v5) 60.4 60.1 -0.3
Code Execution (v5) 82.9 80.5 -2.4
Knowledge Synthesis (v5) 55.2 55.2 +0
Grounding (v5) 72.1 75.2 +3.1
Value 8.5 8.4 -0.1
Stability 37.4 35.9 -1.5
Availability 100.0 100.0 +0

Affected Dimensions

execution

Top Lost Tasks 1

#1 SQL: Consecutive Login Days 100 0
Model Raw Response (excerpt)
```sql
WITH daily AS (
    SELECT DISTINCT user_id, login_date
    FROM user_logins
),
numbered AS (
    SELECT
        user_id,
        login_date,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY login_date
        ) AS rn
    FROM daily
),
streaks AS (
    SELECT
        user_id,
        COUNT(*) AS streak_len
    FROM numbered
    GROUP BY
        user_id,
        login_date - rn * INTERVAL '1 day'
)
SELECT
    user_id,
    MAX(streak_len) AS max_streak
FROM 
Run #112 · Formula v7 · Judge v6 · Benchmark v6 · 2026-05-11 04:21 SGT
View GPT-o3 Full Profile