Skip to main content
Overall Score Drop Severity 10/10 2026-W22

Gemini 2.5 Pro Code Execution (v5) Dropped 19.5 pts

Gemini 2.5 Pro Run #131

Score Comparison

Dimension Previous Current Change
Overall (v5) 67.0 47.7 -19.3
Code Execution (v5) 88.2 56.3 -31.9
Knowledge Synthesis (v5) 55.8 42.3 -13.5
Grounding (v5) 79.3 53.0 -26.3
Value 38.1 26.3 -11.8
Stability 34.3 35.3 +1
Availability 100.0 76.0 -24

Affected Dimensions

代码执行 (v5) -33.4
材料约束 (v5) -29
可用性 -24
性价比 -12.1
知识综合 (v5) -9.4
稳定性 -2.4

Top Lost Tasks 5

#1 CSV Single Line Parsing execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps. 
#2 Debug: Webhook Idempotent Handling execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps. 
#3 Stable Deduplication: Dictionary List execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps. 
#4 Phone Number Normalization execution 100 0 -100 Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps. 
#5 Two-Year TCO Calculation grounding 88 0 -88 Strict
Model Raw Response (excerpt)
[API ERROR] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps. 
Run #131 · Formula v7 · Judge v6 · Benchmark v6 · 2026-05-25 04:16 SGT
View Gemini 2.5 Pro Full Profile