SWE-bench (8 articles)

Claude 3.5 Sonnet Leads GPT-4o in Programming Benchmark: 49% Accuracy Ignites Developer Community

Anthropic's Claude 3.5 Sonnet achieves 49% accuracy on the SWE-bench software engineering benchmark, surpassing OpenAI's GPT-4o (33.2%) for the first time in real programming tasks. This breakthrough has sparked tens of thousands of reposts on X platform and heated discussions in the programmer community, with developers sharing real-world cases demonstrating its human-like debugging capabilities.

Claude 3.5 Sonnet Anthropic SWE-bench
530

Claude 3.5 Sonnet Breaks 90% in Coding Tests: AI Programming Ability Approaches Human Level

Anthropic's Claude 3.5 Sonnet model achieved 92.0% on the SWE-bench software engineering benchmark, surpassing all previous AI models and marking a new milestone in AI coding capabilities. This breakthrough sparked heated discussions on X platform with over 150,000 interactions, as developers shared real projects built with Claude and debated the future role of AI programmers.

Claude 3.5 Anthropic SWE-bench
475

Claude 3.5 Sonnet's Coding Capabilities Lead SWE-bench Rankings: 49% Score Surpasses GPT-4o's 33%

Anthropic's updated Claude 3.5 Sonnet model achieves a breakthrough 49% task resolution rate on the authoritative SWE-bench software engineering benchmark, significantly outperforming OpenAI's GPT-4o (33%) and other competitors. This achievement not only sets a new performance record for coding AI but has also sparked widespread discussion and praise within the global developer community.

Claude 3.5 Sonnet SWE-bench 编码AI
478

Claude 3.5 Sonnet Leads SWE-bench Benchmark, Code Generation Capability Surpasses GPT-4o

Anthropic's Claude 3.5 Sonnet has achieved remarkable performance in the authoritative SWE-bench code benchmark test, successfully surpassing OpenAI's GPT-4o and demonstrating exceptional software engineering capabilities. This breakthrough marks a significant advancement for the Claude series in code generation and provides developers with a more reliable programming assistant.