Overall Score Drop
Severity 10/10
2026-W22
Gemini 2.5 Pro Code Execution (v5) Dropped 19.5 pts
Score Comparison
| Dimension | Previous | Current | Change |
|---|---|---|---|
| Overall (v5) | 67.0 | 47.7 | -19.3 |
| Code Execution (v5) | 88.2 | 56.3 | -31.9 |
| Knowledge Synthesis (v5) | 55.8 | 42.3 | -13.5 |
| Grounding (v5) | 79.3 | 53.0 | -26.3 |
| Value | 38.1 | 26.3 | -11.8 |
| Stability | 34.3 | 35.3 | +1 |
| Availability | 100.0 | 76.0 | -24 |
Affected Dimensions
代码执行 (v5) -33.4
材料约束 (v5) -29
可用性 -24
性价比 -12.1
知识综合 (v5) -9.4
稳定性 -2.4
Top Lost Tasks 5
#1
CSV Single Line Parsing
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.
#2
Debug: Webhook Idempotent Handling
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.
#3
Stable Deduplication: Dictionary List
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.
#4
Phone Number Normalization
execution
100
0
-100
Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.
#5
Two-Year TCO Calculation
grounding
88
0
-88
Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.
Run #131 · Formula v7 · Judge v6 · Benchmark v6 · 2026-05-25 04:16 SGT
View Gemini 2.5 Pro Full Profile