Skip to main content
Strict Task Zeroed Severity 8/10 2026-W24

GPT-o3 综合 Strict Question Zeroed

GPT-o3 Run #154

Score Comparison

Dimension Previous Current Change
Overall (v5) 60.0 72.6 +12.6
Code Execution (v5) 84.5 82.2 -2.3
Knowledge Synthesis (v5) 53.9 91.2 +37.3
Grounding (v5) 71.7 79.3 +7.6
Value 8.5 10.5 +2
Stability 33.8 58.0 +24.2
Availability 100.0 100.0 +0

Affected Dimensions

execution

Top Lost Tasks 1

#1 Reservoir Sampling 100 0
Model Raw Response (excerpt)
def reservoir_sample(stream, k, seed=None):
    rng = random.Random(seed)
    reservoir = []

    if k <= 0:
        return reservoir

    for i, item in enumerate(stream):
        if i < k:
            reservoir.append(item)
        else:
            j = rng.randrange(i + 1)
            if j < k:
                reservoir[j] = item

    return reservoir
Run #154 · Formula v7 · Judge v6.1 · Benchmark v6 · 2026-06-08 04:18 SGT
View GPT-o3 Full Profile