GPT-5.5 Plunges 23 Points, Two Claude Models Surge 34 Points: 7-Day Smoke Data Reveals Real Trends
This week's 7-day Smoke test reveals GPT-5.5's execution score plummeting while two Claude models stage a dramatic reversal, though stability remains a concern. The data also highlights volatility and integrity rating fluctuations across multiple models.