Claude Opus 4.7 Smoke Evaluation Main Chart Plunges 9.6 Points: Degradation Signal or Lottery Farce?
In today's Smoke Evaluation, Claude Opus 4.7's main chart score plummeted from 89.43 to 79.86, a net loss of 9.6 points, with code execution collapsing from a perfect 100 to 75. The sharp drop raises the question of whether this signals model degradation or is merely a random sampling fluctuation.