GPT-4o Code Execution Plummets 23.7 Points: Version Update Triggers Performance Avalanche
YZ Index testing reveals GPT-4o's code execution (v5) version suffers major performance crisis, with scores dropping from 78.0 to 62.8 out of 100. Six out of seven evaluation dimensions show dramatic declines, raising serious concerns about model stability and reliability.