Strict Task Zeroed
Severity 8/10
2026-W24
GPT-o3 综合 Strict Question Zeroed
Score Comparison
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Overall (v5) | 60.0 | 72.6 | +12.6 |
| Code Execution (v5) | 84.5 | 82.2 | -2.3 |
| Knowledge Synthesis (v5) | 53.9 | 91.2 | +37.3 |
| Grounding (v5) | 71.7 | 79.3 | +7.6 |
| Value | 8.5 | 10.5 | +2 |
| Stability | 33.8 | 58.0 | +24.2 |
| Availability | 100.0 | 100.0 | +0 |
Affected Dimensions
execution
Top Lost Tasks 1
#1
Reservoir Sampling
100
0
Model Raw Response (excerpt)
def reservoir_sample(stream, k, seed=None):
rng = random.Random(seed)
reservoir = []
if k <= 0:
return reservoir
for i, item in enumerate(stream):
if i < k:
reservoir.append(item)
else:
j = rng.randrange(i + 1)
if j < k:
reservoir[j] = item
return reservoir
Run #154 · Formula v7 · Judge v6.1 · Benchmark v6 · 2026-06-08 04:18 SGT
View GPT-o3 Full Profile