GPT-o3 Drops from 100 to 0 on One Problem, Yet the Main Board Rises
GPT-o3 scored 0 on a basic debugging problem after a perfect 100 in the previous run, while its main board score actually increased by 2.1.
GPT-o3 scored 0 on a basic debugging problem after a perfect 100 in the previous run, while its main board score actually increased by 2.1.