编程AI (2 articles)

Claude 3.5 Sonnet Tops SWE-bench Coding Benchmark: 72.7% Score Leads AI Programming Track

Anthropic's Claude 3.5 Sonnet achieved a groundbreaking 72.7% score on the SWE-bench software engineering benchmark, becoming the first AI model to exceed 70% and surpassing competitors like GPT-4o and Gemini 1.5 Pro, marking a new era in AI-assisted programming.

Claude 3.5 Sonnet Leads GPT-4o in Programming Benchmark: 49% Accuracy Ignites Developer Community

Anthropic's Claude 3.5 Sonnet achieves 49% accuracy on the SWE-bench software engineering benchmark, surpassing OpenAI's GPT-4o (33.2%) for the first time in real programming tasks. This breakthrough has sparked tens of thousands of reposts on X platform and heated discussions in the programmer community, with developers sharing real-world cases demonstrating its human-like debugging capabilities.