编程AI (2 articles)

Claude 3.5 Sonnet Leads GPT-4o in Programming Benchmark: 49% Accuracy Ignites Developer Community

Anthropic's Claude 3.5 Sonnet achieves 49% accuracy on the SWE-bench software engineering benchmark, surpassing OpenAI's GPT-4o (33.2%) for the first time in real programming tasks. This breakthrough has sparked tens of thousands of reposts on X platform and heated discussions in the programmer community, with developers sharing real-world cases demonstrating its human-like debugging capabilities.