YZ Index · AI Model Change Intelligence

Which AI model should you use today?
We benchmark them every week.

11 models · 212 questions randomly sampled · Real code execution · Citation verification · Rolling average rankings · Don't trust press releases, check continuous performance.

View YZ Index Subscribe to Weekly Changes

Code Sandbox Execution Citation Accuracy Check Statistical Significance Ranking No Vendor Sponsorship

Who to Use Right Now

#1 Overall (Rolling Average) Grok 3

Biggest Rise This Week 文心一言 4.0 +15

Latest Benchmark 2026-04-27 SGT

judge v6

Who to Use Right Now

Start with the overall ranking, then drill into the dimension you care about.

The full leaderboard shows not just who's leading, but how stable that lead is. View Full Leaderboard

Who's Up, Who's Down

One-time spikes don't count. We care about whether sustained performance has shifted.

Biggest change this week: 文心一言 4.0 rose 15 pts。

View Full Change Report

Biggest Gain

文心一言 4.0
+15

Incident Reports

This week 2 incidents

Pricing Changes

0 updates

Don't just look at the overall score — consider your use case

Top Pick

豆包 Pro

92.2 pts

Runner-up

Gemini 2.5 Pro

89.4 pts

Third Choice

Grok 3

88.9 pts

Top Pick

Gemini 2.5 Pro

47.2 pts

Runner-up

Claude Opus 4.6

46.3 pts

Third Choice

豆包 Pro

46.3 pts

Top Pick

Grok 3

84.4 pts

Runner-up

Claude Sonnet 4.6

81.1 pts

Third Choice

Claude Opus 4.6

79.7 pts

Top Pick

DeepSeek V3

99.7 pts

Runner-up

文心一言 4.0

98.5 pts

Third Choice

豆包 Pro

93 pts

Top Pick

豆包 Pro

38.9 pts

Runner-up

Gemini 2.5 Pro

36.6 pts

Third Choice

Claude Opus 4.6

36.6 pts

View Full Recommendations by Use Case

Worth reading today — beyond the hype

We only feature content that impacts capability, pricing, stability, or model selection.

Featured

News

OpenAI CEO Sam Altman Faces Backlash with 4.7k Likes: Ethical Crisis of Dishonesty, Poaching, and Safety Lapses?

OpenAI CEO Sam Altman has become the center of controversy after a viral X post compiled accusations of dishonest behavior, poaching talent from competitors, and prioritizing profits over safety, garnering 4.7k likes and igniting debates on AI leadership ethics. This incident underscores internal turmoil at OpenAI and raises questions about its impact on the broader AI industry's future.

Featured

News

AI Robots Occupy Over 50% of Internet Traffic: Digital Revolution or Infrastructure Crisis?

In the digital age, internet traffic is undergoing a massive shift, with over 50% of global internet traffic now generated by AI robots according to the latest report from network infrastructure provider Lumen Technologies. This revelation not only highlights the rapid penetration of AI technology but also sparks widespread discussions on network infrastructure capacity and robot management strategies.

Featured

News

Behind the AI Funding Frenzy: Is the Trillion-Dollar Valuation a Technological Breakthrough or a Capital Illusion?

According to reports, Anthropic, the developer of Claude, has achieved a $1 trillion valuation in its latest funding round, setting a record in the AI field and sparking shock in the investment community due to extreme optimism about advanced AI technology. However, the undisclosed details of investors and terms raise concerns about the reasonableness of this high valuation and potential bubble risks.

Featured

News

DeepSeek Releases 1.6 Trillion Parameter Open-Source Model Supporting Huawei Chips, API Pricing Only 1/20 of GPT-4 Triggering Global AI Price War

According to the latest news on X platform and verified by Google, Chinese AI startup DeepSeek officially released the V4 series open-source large models in December 2024, including the 1.6 trillion parameter V4-Pro and the 284 billion parameter V4-Flash versions. This release is seen as a milestone in China's construction of an independent AI technology stack, not only benchmarking international top models in technical indicators but also triggering a price war in the global AI service market with its highly disruptive pricing strategy.

Featured

News

OpenAI Releases GPT-5.5 'SPUD': Transition from Conversational AI to Autonomous Agents, Achieving 85% Human Level in Multi-Step Tasks

OpenAI has officially released the GPT-5.5 'SPUD' model, described as a "new type of intelligence," marking a significant shift from conversational interactions to task-executing agents. It achieves 85% human-level performance in multi-step workflows, greatly reducing the need for manual intervention.

Featured

News

OpenAI Partners with Qualcomm to Develop AI Phone: Mass Production in 2028 to Challenge iPhone, Replacing App Ecosystem with AI Agents

According to analyst Ming-Chi Kuo, OpenAI is collaborating with Qualcomm and MediaTek to develop custom smartphone processors and with Luxshare for design and manufacturing, aiming for mass production in 2028. This move signifies AI giants extending from software services to hardware, with the phone featuring an AI agent system to replace traditional apps.

Not all AI news is worth reading. What matters is what changes your judgment. View All News

Why This Leaderboard Is Worth Your Attention

Not because we're loud, but because our methods are open, rules are fixed, and results are traceable.

Real Code Execution

Looking like it can code isn't enough. We run the code in a sandbox. If it doesn't pass, it's zero.

Citation Verification

For long-document questions, we don't just check if the answer looks right — we verify citations trace back to the source.

Statistical Rankings

We don't judge on a single run. Rankings are based on rolling averages, avoiding luck-driven fluctuations.

No Sponsored Benchmarks

No co-evaluations, no pre-test consultations, no favoritism. Whatever the results are, that's what we publish.

View Methodology

Want deeper analysis? Go further.

The leaderboard answers "who's stronger." Research Lab answers "why." Model safety, edge deployment, performance teardowns — not rehashing papers, but conclusions from our own testing.

Enter Research Lab

Which AI model should you use today?We benchmark them every week.

Who to Use Right Now

Who's Up, Who's Down

Don't just look at the overall score — consider your use case

Worth reading today — beyond the hype

Why This Leaderboard Is Worth Your Attention

The AI world changes daily — you need a reliable source

Want deeper analysis? Go further.

Which AI model should you use today?
We benchmark them every week.