YZ Index · AI Model Change Intelligence

Which AI model should you use today?
We benchmark them every week.

11 models · 212 questions randomly sampled · Real code execution · Citation verification · Rolling average rankings · Don't trust press releases, check continuous performance.

Code Sandbox Execution Citation Accuracy Check Statistical Significance Ranking Compliance Testing No Vendor Sponsorship
Who to Use Right Now
#1 Overall (Rolling Average) Claude Sonnet 4.6
Biggest Rise This Week Qwen3 Max +68.5
Biggest Drop DeepSeek V3 -75.1
Latest Benchmark 2026-05-18 SGT
judge v6
0
Models Tested
0
Test Questions
0
DCD Scenarios
5 categories x 6 questions
Weekly
Auto-evaluation frequency

Don't just look at the overall score — consider your use case

Top Pick
豆包 Pro
89.8 pts
Runner-up
Grok 4
86.8 pts
Third Choice
Claude Sonnet 4.6
86.8 pts
Top Pick
Claude Opus 4.7
55.8 pts
Runner-up
Claude Sonnet 4.6
52.9 pts
Third Choice
Gemini 3.1 Pro
48.8 pts
Top Pick
Claude Sonnet 4.6
78.4 pts
Runner-up
Claude Opus 4.7
75.2 pts
Third Choice
Grok 4
73.9 pts
Top Pick
deepseek-v3
99.7 pts
Runner-up
ernie-4
98.5 pts
Third Choice
文心一言 4.5
98.3 pts
Top Pick
豆包 Pro
38.9 pts
Runner-up
Gemini 3.1 Pro
38.2 pts
Third Choice
Claude Sonnet 4.6
38 pts
Top Pick
claude-opus-4.6
0 pts
Runner-up
Claude Opus 4.7
0 pts
Third Choice
Claude Sonnet 4.6
0 pts
Claude Opus 4.7
65 pts
Claude Sonnet 4.6
62.5 pts
豆包 Pro
60 pts

View Full Recommendations by Use Case

Worth reading today — beyond the hype

We only feature content that impacts capability, pricing, stability, or model selection.

News
企业AI的障碍与路线图,安全与物理AI成焦点
TechEx北美大会第二天深入剖析企业级AI的落地困境与未来方向。会议指出大量AI项目陷入“墓地”——试点成功但难以扩展。专家围绕数据治理、安全防护和物理AI三大议题展开讨论,提出企业需建立清晰的规模化路线图,并警惕对抗性攻击等安全威胁。物理AI(如自主机器人)被视为下一波浪潮,但面临软硬件协同挑战。
News
文学奖得主陷入AI代笔风波:新常态降临?
英联邦短篇小说奖五位地区获奖者中,三人被指控依赖聊天机器人创作。这并非孤例,随着AI写作工具普及,文学界正面临前所未有的信任危机。从奖项评审到读者接受度,AI生成内容与人类创作的界限日益模糊,引发关于原创性、版权和文学本质的深度反思。
News
A Five-Minute Review of Six Months of LLM Progress: Innovation Highlights and Real-World Challenges Coexist
This report summarizes the evolution of the LLM field over the past six months in a five-minute format, covering model iterations, application deployments, and industry signals, highlighting significant progress in code execution and grounding while noting persistent challenges.
News
Renowned AI Architect Confirms Joining Anthropic, Verified by Multiple Sources Including Google
A well-known AI architect has confirmed joining Anthropic, with news verified by multiple sources including Google Search grounding, and reported by Gizmodo, Business Insider, and VentureBeat.
News
Gemini Omni Confirmed by Google Multi-Source Verification; Trend Signals Reflect New Changes in Multimodal Competition
Google's verification confirms Gemini Omni with six grounded sources, signaling a structural shift toward multimodal integration. The YZ Index highlights auditability and material grounding as key dimensions for evaluation.
News
谷歌I/O 2026:Gemini升级、搜索革新、智能眼镜来袭
2026年谷歌I/O大会聚焦AI全方位渗透:Gemini模型能力跃升、搜索迎来Agent交互新时代、智能眼镜秋季登场。本文详解三大核心发布,并剖析谷歌在AI竞赛中的战略意图。
News
马斯克指控奥特曼“窃取”非营利组织,审判却暴露双方目标相似
一场围绕OpenAI非营利性质的法律战,将埃隆·马斯克和萨姆·奥特曼推上风口浪尖。马斯克指责奥特曼窃取了他创立的非营利组织,但庭审证据显示,马斯克本人也曾试图将OpenAI商业化,甚至计划与奥特曼一起打造“最被憎恨”的超级公司。这场审判揭开了AI行业理想与资本冲突的深层矛盾。
News
马斯克诉奥尔特曼案内幕:庭审背后的AI伦理之争
埃隆·马斯克指控OpenAI首席执行官萨姆·奥尔特曼和总裁格雷格·布罗克曼在其非营利地位上欺骗了他。然而,法院最终驳回了马斯克的诉求。本文深度解析庭审关键细节,探讨AI治理与创始人信任危机。
News
从黑客少年到“铁穹”研究员,他融资2800万美元对抗AI钓鱼
Ocean,一款基于智能代理的电子邮件安全平台,宣布获得Lightspeed Venture Partners的2800万美元融资。创始人从一名青少年黑客转型为以色列“铁穹”防御系统的安全研究员,如今瞄准AI驱动的钓鱼攻击。本文深入探讨了AI钓鱼的威胁、代理型安全平台的创新之处,以及创始人的传奇经历。

Not all AI news is worth reading. What matters is what changes your judgment. View All News

Why This Leaderboard Is Worth Your Attention

Real Code Execution
Looking like it can code isn't enough. We run the code in a sandbox. If it doesn't pass, it's zero.
Citation Verification
For long-document questions, we don't just check if the answer looks right — we verify citations trace back to the source.
Statistical Rankings
We don't judge on a single run. Rankings are based on rolling averages, avoiding luck-driven fluctuations.
No Sponsored Benchmarks
No co-evaluations, no pre-test consultations, no favoritism. Whatever the results are, that's what we publish.

View Methodology

The AI world changes daily — you need a reliable source

3 curated picks daily, weekly index changes, instant alerts for incidents and price shifts. Free, no ads, unsubscribe anytime.

  • Daily Picks — From the flood of AI news, we pick the 3 that truly matter
  • YZ Index Weekly — Who's up, who's down — one email covers it all
  • Model Incident Alerts — When a model you use has an issue, know immediately
  • Price Change Notifications — API price changes — don't find out from the bill
Free | No Ads | No Sponsored Content | Unsubscribe Anytime

Want deeper analysis? Go further.

The leaderboard answers "who's stronger." Research Lab answers "why." Model safety, edge deployment, performance teardowns — not rehashing papers, but conclusions from our own testing.

Enter Research Lab