Skip to main content
Overall Score Drop Severity 10/10 2026-W14

GPT-4o Code Execution (v5)下跌 10.5 分

GPT-4o Run #52

Score Comparison

Dimension Previous Current Change
Overall (v5) 81.1 49.3 -31.8
Code Execution (v5) 78.0 62.8 -15.2
Knowledge Synthesis (v5) 79.0 47.2 -31.8
Grounding (v5) 80.1 49.1 -31
Value 79.0 24.9 -54.1
Stability 80.0 27.8 -52.2
Availability 100.0 79.0 -21

Affected Dimensions

代码执行 (v5) -23.7
可用性 -15.9
材料约束 (v5) -14.6
性价比 -6.1
稳定性 -2.8

Top Lost Tasks 5

#1 SQL Window Function execution 100 0 -100 Strict
Model Raw Response (excerpt)
```sql
SELECT department, name, salary
FROM (
    SELECT department, name, salary,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC, joined_at ASC) as rank
    FROM employees
) ranked_employees
WHERE rank = 1;
```
#2 Deep Configuration Merge execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 30000, Requested 153. Please try again in 306ms. Visit https://platform.openai.com/account/rate-limits to learn more.
#3 Semantic Version Comparison execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 30000, Requested 135. Please try again in 270ms. Visit https://platform.openai.com/account/rate-limits to learn more.
#4 Debug: Webhook Idempotent Handling execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 30000, Requested 206. Please try again in 412ms. Visit https://platform.openai.com/account/rate-limits to learn more.
#5 SQL: JOIN Causing Amount Duplication execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 29894, Requested 127. Please try again in 42ms. Visit https://platform.openai.com/account/rate-limits to learn more.
Run #52 · Formula v7 · Judge v6 · Benchmark v6 · 2026-03-30 04:16 SGT
View GPT-4o Full Profile