工程实践 (1 articles)

GPT-4o Crashes: Engineers' Most Trusted AI's Judgment Drops to 0

GPT-4o's bug detection capability catastrophically failed in the latest evaluation, scoring 0 on a basic code review test while paradoxically improving its overall programming score, revealing systemic issues in AI development priorities.