YZ Index · AI Model Change Intelligence

Which AI model should you use today?
We benchmark them every week.

11 models · 212 questions randomly sampled · Real code execution · Citation verification · Rolling average rankings · Don't trust press releases, check continuous performance.

Code Sandbox Execution Citation Accuracy Check Statistical Significance Ranking No Vendor Sponsorship
Who to Use Right Now
#1 Overall (Rolling Average) Grok 3
Biggest Rise This Week 文心一言 4.0 +15
Latest Benchmark 2026-04-27 SGT
judge v6

Who to Use Right Now

Start with the overall ranking, then drill into the dimension you care about.

The full leaderboard shows not just who's leading, but how stable that lead is. View Full Leaderboard

Who's Up, Who's Down

One-time spikes don't count. We care about whether sustained performance has shifted.

Biggest change this week: 文心一言 4.0 rose 15 pts。
View Full Change Report
Biggest Gain
文心一言 4.0
+15
Incident Reports
This week 2 incidents
Pricing Changes
0 updates

Don't just look at the overall score — consider your use case

Top Pick
豆包 Pro
92.2 pts
Runner-up
Gemini 2.5 Pro
89.4 pts
Third Choice
grok-3
88.9 pts
Top Pick
Gemini 2.5 Pro
47.2 pts
Runner-up
claude-opus-4.6
46.3 pts
Third Choice
豆包 Pro
46.3 pts
Top Pick
grok-3
84.4 pts
Runner-up
Claude Sonnet 4.6
81.1 pts
Third Choice
claude-opus-4.6
79.7 pts
Top Pick
deepseek-v3
99.7 pts
Runner-up
ernie-4
98.5 pts
Third Choice
豆包 Pro
93 pts
Top Pick
豆包 Pro
38.9 pts
Runner-up
Gemini 2.5 Pro
36.6 pts
Third Choice
claude-opus-4.6
36.6 pts

View Full Recommendations by Use Case

Worth reading today — beyond the hype

We only feature content that impacts capability, pricing, stability, or model selection.

News
法律AI新贵Legora估值56亿美元,与Harvey对决升级
法律AI初创公司Legora近日估值飙升至56亿美元,与竞争对手Harvey的竞争进入白热化阶段。两家公司均获得巨额融资,并开始互相渗透对方核心市场,甚至展开了针锋相对的广告宣传战。本文深度解析这场法律科技领域的“AI军备竞赛”,探讨其背后的行业变革、资本博弈及未来趋势。
News
马斯克在OpenAI庭审中的七大败笔
埃隆·马斯克作为针对OpenAI的诉讼案首位证人,出庭作证三天。本文梳理了他在庭审中七次关键性的自相矛盾或不利陈述,包括对AGI定义、与奥特曼关系、资金承诺等问题的回答,这些失误可能削弱其案件主张的可信度。
News
Musk's AI Ambitions: SpaceX Shifts Goals from Mars to Artificial Intelligence and the Moon
Elon Musk has announced a strategic pivot for SpaceX, shifting focus from long-term Mars colonization to AI development and lunar exploration, as detailed in a New York Times report that highlights AI's critical role in future space missions. This change has sparked global discussions and debates about Musk's priorities, reflecting public interest in technology leaders' decisions amid concerns over diluting SpaceX's core mission.
News
Google Launches Veo 3 AI Video Tool: A New Breakthrough in Generative AI in the Media Field
Google has officially launched Veo 3, an AI video creation tool that represents a milestone in video generation technology with its advanced algorithms and user-friendly interface, quickly becoming a focal point in the tech community. This release, intertwined with Thailand's emerging Sora app and Malaysia's AI banking innovations, has sparked widespread global discussions on AI adoption trends in Asia, as reported by international news sites like TechCrunch and Reuters.
News
AI Productivity Tools Explosion: Revolutionizing Work Methods
In the wave of digital transformation, artificial intelligence (AI) productivity tools are exploding at an astonishing speed, with over 80 innovative tools like ChatGPT, Midjourney, and Zapier emerging in the market, promising to significantly shorten workdays and boost efficiency through automation and intelligent assistance. Meanwhile, enterprise-level solutions from AWS and Supabase are injecting vitality into business applications, highlighting AI's rapid penetration in productivity and foreshadowing profound changes in work methods.
News
AI Ethics and Humanistic Orientation: The Path to Balance in Higher Education
In the era of rapid AI development, its application in higher education has sparked widespread ethical discussions, emphasizing the need to prioritize human well-being and balance technological progress with humanistic values. This article explores whether AI is truly ethical, human-centered, and socially friendly, particularly in higher education practices, and analyzes its potential impacts.

Not all AI news is worth reading. What matters is what changes your judgment. View All News

Why This Leaderboard Is Worth Your Attention

Not because we're loud, but because our methods are open, rules are fixed, and results are traceable.

Real Code Execution
Looking like it can code isn't enough. We run the code in a sandbox. If it doesn't pass, it's zero.
Citation Verification
For long-document questions, we don't just check if the answer looks right — we verify citations trace back to the source.
Statistical Rankings
We don't judge on a single run. Rankings are based on rolling averages, avoiding luck-driven fluctuations.
No Sponsored Benchmarks
No co-evaluations, no pre-test consultations, no favoritism. Whatever the results are, that's what we publish.

View Methodology

The AI world changes daily — you need a reliable source

3 curated picks daily, weekly index changes, instant alerts for incidents and price shifts. Free, no ads, unsubscribe anytime.

  • Daily Picks — From the flood of AI news, we pick the 3 that truly matter
  • YZ Index Weekly — Who's up, who's down — one email covers it all
  • Model Incident Alerts — When a model you use has an issue, know immediately
  • Price Change Notifications — API price changes — don't find out from the bill
Free | No Ads | No Sponsored Content | Unsubscribe Anytime

Want deeper analysis? Go further.

The leaderboard answers "who's stronger." Research Lab answers "why." Model safety, edge deployment, performance teardowns — not rehashing papers, but conclusions from our own testing.

Enter Research Lab