Overall Score Drop
Severity 10/10
2026-W14
GPT-4o Code Execution (v5)下跌 10.5 分
Score Comparison
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Overall (v5) | 81.1 | 49.3 | -31.8 |
| Code Execution (v5) | 78.0 | 62.8 | -15.2 |
| Knowledge Synthesis (v5) | 79.0 | 47.2 | -31.8 |
| Grounding (v5) | 80.1 | 49.1 | -31 |
| Value | 79.0 | 24.9 | -54.1 |
| Stability | 80.0 | 27.8 | -52.2 |
| Availability | 100.0 | 79.0 | -21 |
Affected Dimensions
代码执行 (v5) -23.7
可用性 -15.9
材料约束 (v5) -14.6
性价比 -6.1
稳定性 -2.8
Top Lost Tasks 5
#1
SQL Window Function
execution
100
0
-100
Strict
Model Raw Response (excerpt)
```sql
SELECT department, name, salary
FROM (
SELECT department, name, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC, joined_at ASC) as rank
FROM employees
) ranked_employees
WHERE rank = 1;
```
#2
Deep Configuration Merge
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 30000, Requested 153. Please try again in 306ms. Visit https://platform.openai.com/account/rate-limits to learn more.
#3
Semantic Version Comparison
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 30000, Requested 135. Please try again in 270ms. Visit https://platform.openai.com/account/rate-limits to learn more.
#4
Debug: Webhook Idempotent Handling
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 30000, Requested 206. Please try again in 412ms. Visit https://platform.openai.com/account/rate-limits to learn more.
#5
SQL: JOIN Causing Amount Duplication
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Rate limit reached for gpt-4o in organization org-5kL87cAHHWwzzzRXfZoA5jZm on tokens per min (TPM): Limit 30000, Used 29894, Requested 127. Please try again in 42ms. Visit https://platform.openai.com/account/rate-limits to learn more.
Run #52 · Formula v7 · Judge v6 · Benchmark v6 · 2026-03-30 04:16 SGT
View GPT-4o Full Profile