路由波动 (1 articles)

GPT-5.5's Main Ranking Plunges 28 Points: Is It Real Degradation?

GPT-5.5's code execution score dropped from 100 to 50, causing a 28-point drop in the main ranking. But is this degradation or just sampling noise?