Gemini 3.1 Pro Drops 8.5 Points on Main Leaderboard, Code Execution Plummets 9.5 – Lottery or Degradation?
In today's Smoke evaluation, Gemini 3.1 Pro saw a sharp 8.5-point drop on the main leaderboard, with code execution falling from 66.70 to 57.20 and material constraints dropping from 86.30 to 79.00. The fluctuations are attributed to a combination of question sampling volatility and declining model consistency, placing the current status in an "observation period" rather than an "alert period."