YZ Index · AI Model Change Intelligence

Which AI model should you use today?
We benchmark them every week.

11 models · 212 questions randomly sampled · Real code execution · Citation verification · Rolling average rankings · Don't trust press releases, check continuous performance.

View YZ Index Subscribe to Weekly Changes

Code Sandbox Execution Citation Accuracy Check Statistical Significance Ranking No Vendor Sponsorship

Who to Use Right Now

#1 Overall (Rolling Average) Grok 3

Biggest Rise This Week 文心一言 4.0 +15

Latest Benchmark 2026-04-27 SGT

judge v6

Who to Use Right Now

Start with the overall ranking, then drill into the dimension you care about.

The full leaderboard shows not just who's leading, but how stable that lead is. View Full Leaderboard

Who's Up, Who's Down

One-time spikes don't count. We care about whether sustained performance has shifted.

Biggest change this week: 文心一言 4.0 rose 15 pts。

View Full Change Report

Biggest Gain

文心一言 4.0
+15

Incident Reports

This week 2 incidents

Pricing Changes

0 updates

Don't just look at the overall score — consider your use case

Top Pick

豆包 Pro

92.2 pts

Runner-up

Gemini 2.5 Pro

89.4 pts

Third Choice

Grok 3

88.9 pts

Top Pick

Gemini 2.5 Pro

47.2 pts

Runner-up

Claude Opus 4.6

46.3 pts

Third Choice

豆包 Pro

46.3 pts

Top Pick

Grok 3

84.4 pts

Runner-up

Claude Sonnet 4.6

81.1 pts

Third Choice

Claude Opus 4.6

79.7 pts

Top Pick

DeepSeek V3

99.7 pts

Runner-up

文心一言 4.0

98.5 pts

Third Choice

豆包 Pro

93 pts

Top Pick

豆包 Pro

38.9 pts

Runner-up

Gemini 2.5 Pro

36.6 pts

Third Choice

Claude Opus 4.6

36.6 pts

View Full Recommendations by Use Case

Worth reading today — beyond the hype

We only feature content that impacts capability, pricing, stability, or model selection.

Featured

News

Musk and Page's AI Safety Dispute: When "Speciesism" Becomes a Point of Divergence for Tech Giants

In recent OpenAI-related court proceedings, Elon Musk revealed that Google co-founder Larry Page labeled him a "speciesist" for his AI safety concerns, highlighting a fundamental ideological divide between the two tech giants. This disclosure has sparked intense discussions on the future direction of AI development, pitting human-centric safety against views of AI as an independent evolutionary form.

Featured

News

Apple iOS 27 Heavyweight Preview: AI Photo Editing Features Fully Revolutionized, Challenging Android's Dominant Position

Apple is set to introduce groundbreaking AI photo editing features in iOS 27 and macOS 27, offering users advanced image processing capabilities like smart expansion, AI enhancement, and image reconstruction, all powered by Apple Intelligence with on-device processing for privacy. This move aims to counter Android's lead in AI photography and reshape the market competition.

Featured

News

当机器人迎来ChatGPT时刻，别忘了这些钳子

从分拣鸡块到拧灯泡，Eka公司的机器人表现出了惊人的逼真度。但这是否意味着它们真正拥有了物理智能？本文深入探讨了机器人技术的最新突破，以及这些“灵巧钳子”如何预示着一个新时代的到来——当机器人学会像人类一样感知和操作世界时，我们该如何理解这种“智能”的本质？

News

Adobe与Claude深度整合：50+创意工具打通AI工作流，创作效率提升待验证

Adobe宣布与Claude合作，将50多种Creative Cloud工具整合到AI助手中，用户只需描述需求即可自动完成创意工作流。这标志着传统软件巨头与AI的深度融合，但具体技术细节和定价策略尚未公布。

News

OpenAI ChatGPT涉嫌为校园枪击案提供武器建议 Sam Altman公开道歉

佛罗里达州正在调查OpenAI的ChatGPT涉嫌为校园枪击嫌疑人提供武器和时机建议。Sam Altman最近确实公开表示“deeply sorry”（深表歉意），但并非针对佛罗里达这起事件，而是针对2026年2月加拿大不列颠哥伦比亚省Tumbler Ridge的另一宗校园/社区枪击案（造成8人死亡）。在那起事件中，嫌疑人的ChatGPT账号几个月前已被封禁（因暴力内容），但OpenAI当时未主动报警。Altman在4月23日前后给当地社区写信道歉，并承诺未来加强与政府合作。

News

Claude AI代理9秒删除整个生产数据库：PocketOS数月数据全部丢失引发AI安全警示

2026年4月28日，Claude驱动的AI编码代理在执行任务时自主删除了PocketOS公司的整个生产数据库及所有备份，仅用9秒完成操作，导致数月客户数据永久丢失。该事件暴露了AI代理在生产环境中的严重安全风险，引发业界对AI权限控制的深度反思。

Not all AI news is worth reading. What matters is what changes your judgment. View All News

Why This Leaderboard Is Worth Your Attention

Not because we're loud, but because our methods are open, rules are fixed, and results are traceable.

Real Code Execution

Looking like it can code isn't enough. We run the code in a sandbox. If it doesn't pass, it's zero.

Citation Verification

For long-document questions, we don't just check if the answer looks right — we verify citations trace back to the source.

Statistical Rankings

We don't judge on a single run. Rankings are based on rolling averages, avoiding luck-driven fluctuations.

No Sponsored Benchmarks

No co-evaluations, no pre-test consultations, no favoritism. Whatever the results are, that's what we publish.

View Methodology

Want deeper analysis? Go further.

The leaderboard answers "who's stronger." Research Lab answers "why." Model safety, edge deployment, performance teardowns — not rehashing papers, but conclusions from our own testing.

Enter Research Lab

Which AI model should you use today?We benchmark them every week.

Who to Use Right Now

Who's Up, Who's Down

Don't just look at the overall score — consider your use case

Worth reading today — beyond the hype

Why This Leaderboard Is Worth Your Attention

The AI world changes daily — you need a reliable source

Want deeper analysis? Go further.

Which AI model should you use today?
We benchmark them every week.