Claude 3.5 Sonnet Tops AI Rankings: Outperforms GPT-4o in Coding and Vision, Doubles Speed to Reshape Competitive Landscape

Feb 3, 2026 494 approx.6min Grok/X

Claude 3.5 Sonnet Anthropic AI技术突破 Arena排行榜 GPT-4o

News Lead

On June 21, 2024, Beijing time, AI startup Anthropic officially launched the Claude 3.5 Sonnet model. This upgraded large language model excelled in multiple benchmark tests, particularly surpassing OpenAI's GPT-4o in coding and visual understanding tasks, with reasoning speed increased by 2x. The model quickly topped the LMSYS Chatbot Arena leaderboard, triggering over 80,000 interactions on X platform with users flooding feeds with test results, marking a new phase in the AI race.

Background Introduction

Founded in 2021 by former OpenAI executive Dario Amodei and his team, Anthropic's core mission is developing 'safety-aligned' AI systems. Unlike OpenAI's pursuit of extreme performance, Anthropic emphasizes ensuring model behavior aligns with human values through its 'Constitutional AI' framework to avoid harmful outputs. Since Claude 3's release, the Claude series has built a reputation for multimodal capabilities and safety. Claude 3.5 Sonnet is an optimization of the mid-sized Sonnet variant, aimed at balancing performance, cost, and speed.

In today's rapidly iterating AI industry, OpenAI's GPT-4o dominates with real-time voice and multimodal capabilities, but faces criticism for high computational demands and potential safety risks. Claude 3.5 Sonnet's launch comes as industry demand for efficient, safe models surges.

Core Content: Technical Breakthrough Analysis

Claude 3.5 Sonnet's biggest highlight is its comprehensive leadership in benchmark tests. According to Anthropic's official data, the model scored 59.4% on GPQA Diamond (graduate-level problem solving), surpassing GPT-4o's 53.6%; achieved 88.7% on MMLU (Massive Multitask Language Understanding), slightly above GPT-4o's 88.7%; and scored 77.0% on visual tasks like ChartQA, higher than GPT-4o's 75.4%.

Coding capability represents another major breakthrough. In SWE-bench Verified testing, Claude 3.5 Sonnet scored 49%, far exceeding GPT-4o's 33.2% and Gemini 1.5 Pro's 26.5%. This means it can more accurately fix bugs in real GitHub codebases. User feedback shows significantly improved accuracy and efficiency in generating code for complex programming tasks.

For speed, Claude 3.5 Sonnet outputs 151 tokens per second, twice that of Claude 3 Opus, with input processing speed reaching 78K tokens per second. Anthropic attributes this to optimized Mixture of Experts (MoE) architecture and efficient inference engines, reducing API call latency by 80%.

Additionally, the model supports visual inputs, analyzing charts, screenshots, and photos. It topped LMSYS Arena blind testing with an Elo score of 1284, leading GPT-4o mini by over 30 points. X platform data shows related topic interactions exceeded 80,000 within 24 hours of release, with users like @levelsio sharing: "Claude 3.5 Sonnet crushes everything in frontend coding, I rewrote my entire project with it."

Various Perspectives

Industry response has been enthusiastic. Anthropic CEO Dario Amodei posted on X: "Claude 3.5 Sonnet proves safety and cutting-edge performance aren't opposites. We prioritize reliability and controllability."

"This isn't a simple incremental upgrade, but a paradigm shift. Safety-aligned AI can finally compete with black-box models." — Dario Amodei, Anthropic CEO

OpenAI remains low-key, but an insider anonymously told The Information: "We're accelerating GPT-4o iterations, competition will drive industry progress." xAI founder Elon Musk commented on X: "Interesting progress, but Grok is still catching up. Safety matters, openness matters more."

Developer community opinions are divided. On Hacker News, a frontend engineer stated: "Sonnet's visual coding ability saves me 50% time, highly recommended." But users also noted: "Still has hallucination issues in long context tasks, less stable than GPT-4o." Independent tester Andrej Karpathy (former OpenAI researcher) shared video demonstrations on X: "Claude 3.5 slightly edges out in mathematical reasoning, but GPT-4o is stronger in creative writing."

Impact Analysis

Claude 3.5 Sonnet's breakthrough has profound implications for the AI ecosystem. First, it challenges OpenAI's pricing dominance: Sonnet costs only $3 per million input tokens and $15 for output, far below GPT-4o's $5/$15, with faster speed, potentially capturing enterprise market share. Gartner analysts predict safety-first models will comprise 40% of enterprise deployments by 2025.

Second, the model reinforces the 'safety alignment' paradigm. Anthropic's Constitutional AI, through self-supervised training, avoids bias amplification from RLHF (Reinforcement Learning from Human Feedback), addressing EU AI Act requirements for high-risk models. This may prompt OpenAI and Google to adjust strategies, driving industry-wide transformation toward explainable AI.

From the user perspective, X and Reddit test shares show Sonnet's popularity soaring in programming, data analysis, and creative tools. But challenges remain: Anthropic's closed-source strategy has angered the open-source community, and model hallucinations and context window (200K tokens) need optimization. Long-term, this release intensifies the multimodal AI arms race, with Gemini 2.0 and Llama 4 counterattacks expected in the second half.

Conclusion

Claude 3.5 Sonnet's ascent represents not just Anthropic's technical victory, but a signal of safe AI's rise. Facing the trade-off between performance and ethics, it reminds the industry: true breakthroughs lie in sustainable innovation. As user feedback pours in, next-generation model iterations will accelerate. The AI race is shifting from speed to wisdom and responsibility.

Background Introduction

Core Content: Technical Breakthrough Analysis

Various Perspectives

Impact Analysis

Conclusion

Related Articles