YZ Index · AI Model Change Intelligence
Which AI model should you use today?
We benchmark them every week.
11 models · 212 questions randomly sampled · Real code execution · Citation verification · Rolling average rankings · Don't trust press releases, check continuous performance.
Code Sandbox Execution
Citation Accuracy Check
Statistical Significance Ranking
Compliance Testing
No Vendor Sponsorship
Who to Use Right Now
#1 Overall (Rolling Average)
Grok 3
Biggest Rise This Week
文心一言 4.0 +15
Latest Benchmark
2026-05-04 SGT
judge
v6
0
Models Tested
0
Test Questions
0
DCD Scenarios
5 categories x 6 questions
Weekly
Auto-evaluation frequency
Overall Top 5Rolling average
Full RankingsQuick Scene Lookup
Recommend by ScenarioWeekly Signals
Changes ReportDon't just look at the overall score — consider your use case
Top Pick
豆包 Pro
92.2 pts
Runner-up
Gemini 2.5 Pro
89.4 pts
Third Choice
grok-3
88.9 pts
Top Pick
Gemini 2.5 Pro
47.2 pts
Runner-up
claude-opus-4.6
46.3 pts
Third Choice
豆包 Pro
46.3 pts
Top Pick
grok-3
84.4 pts
Runner-up
Claude Sonnet 4.6
81.1 pts
Third Choice
claude-opus-4.6
79.7 pts
Top Pick
deepseek-v3
99.7 pts
Runner-up
ernie-4
98.5 pts
Third Choice
豆包 Pro
93 pts
Top Pick
豆包 Pro
38.9 pts
Runner-up
Gemini 2.5 Pro
36.6 pts
Third Choice
claude-opus-4.6
36.6 pts
Top Pick
claude-opus-4.6
0 pts
Runner-up
Claude Sonnet 4.6
0 pts
Third Choice
deepseek-r1
0 pts
Qwen3 Max
65 pts
Claude Sonnet 4.6
62.5 pts
DeepSeek V4 Pro
62.5 pts
Worth reading today — beyond the hype
We only feature content that impacts capability, pricing, stability, or model selection.
News
IVF技术革新与阳台太阳能崛起
本文编译自MIT Technology Review的每日科技简报。一方面,体外受精(IVF)在过去四十年已帮助数百万婴儿诞生,但过程仍缓慢、痛苦且昂贵,技术革新正试图改变这一现状;另一方面,阳台太阳能作为一种新兴的分布式能源解决方案,正以低门槛、易安装的特点在全球家庭中快速普及。两大趋势共同折射出科技如何从医疗与能源两端重塑人类生活。
News
Spotify AI DJ新增四门语言,个性化推荐再进化
Spotify近日宣布其AI DJ功能正式支持法语、德语、意大利语和巴西葡萄牙语,进一步拓展了该功能的全球覆盖范围。这一更新基于OpenAI的语音技术,能够以更自然的语调进行音乐推荐和评论。随着多语言支持的上线,Spotify在个性化音乐体验上迈出了重要一步,同时也引发了关于AI与音乐行业交互的更多讨论。本文编译自TechCrunch。
News
Spotify欲打造AI个人音频内容新家园
Spotify正在探索将AI生成个人音频内容纳入平台的新方向。用户可通过Codex或Claude Code等AI工具创建播客并直接导入Spotify,使每个人都能轻松制作个性化音频。这一举措不仅将丰富Spotify的内容生态,也可能彻底改变音频内容的创作和分发方式。
News
马斯克曾试图挖角OpenAI创始人在特斯拉建AI部门
据最新报道,埃隆·马斯克曾试图从OpenAI挖走其创始人,在特斯拉内部成立一个独立的AI部门,前提是他必须获得完全控制权。这一举动揭示了马斯克对AI技术主导权的强烈渴望,以及他与OpenAI之间日益紧张的关系。分析人士认为,这可能是马斯克打造自己AI帝国计划的一部分。
News
开源AI需求井喷,月之暗面融资20亿美元估值达200亿
中国AI独角兽Moonshot AI(月之暗面)宣布完成20亿美元新一轮融资,估值飙升至200亿美元。这轮融资发生在全球开源AI需求激增的背景下。该公司4月年化经常性收入(ARR)突破2亿美元,主要得益于付费订阅和API使用量的快速增长。本轮融资由红杉中国、阿里巴巴等领投,资金将用于扩大模型训练规模与开源生态建设。
News
阳台太阳能热袭美国:插电即用,减排省电
美国数十个州正在考虑立法,允许居民安装无需专业施工的插入式太阳能系统(即“阳台太阳能”)。这类微型光伏阵列在欧洲已普及,能显著降低电费和碳排放。支持者认为,这套系统有望打破美国太阳能普及的障碍,让租户和公寓居民也能享受清洁能源。本文梳理立法动态、技术优势与潜在挑战。
News
雷鬼乐队与AI混音“噩梦”之战
当Stick Figure六年前的歌曲突然登上排行榜,乐队一度欣喜若狂。然而,这次病毒式传播的推手竟是未经授权的AI混音——这些低质量、无版权的作品在流媒体平台泛滥,迫使乐队陷入一场维护原创音乐尊严的漫长战斗。本文编译自WIRED。
News
AI编程工具酿祸:数千应用泄露企业及个人数据
以Lovable、Base44、Replit、Netlify为代表的AI驱动开发平台,让任何人都能在几秒内构建Web应用。然而安全研究人员发现,已有成千上万通过这类“氛围编码”(vibe-coding)生成的应用,将数据库凭证、API密钥、用户信息等高度敏感数据直接暴露于公网,构成严重安全风险。
News
IVF的未来:突破与挑战并存
48年前,路易丝·布朗成为世界首例试管婴儿。此后,数百万试管婴儿借助技术进步得以诞生,IVF变得更安全、更有效。然而,它仍不完美——成功率、伦理争议、高昂成本等问题亟待解决。本文梳理IVF技术演进,展望基因编辑、AI辅助、子宫内膜受体分析等前沿方向,并探讨如何让这一技术惠及更多家庭。
Not all AI news is worth reading. What matters is what changes your judgment. View All News
Why This Leaderboard Is Worth Your Attention
0
Models Tested
Fully transparent
0
Open Questions
Random sampling
30
Compliance Scenarios
Zero AI judging
1998
Founded
Continuously operating
0
Vendor Sponsors
Fully independent
Real Code Execution
Looking like it can code isn't enough. We run the code in a sandbox. If it doesn't pass, it's zero.
Citation Verification
For long-document questions, we don't just check if the answer looks right — we verify citations trace back to the source.
Statistical Rankings
We don't judge on a single run. Rankings are based on rolling averages, avoiding luck-driven fluctuations.
No Sponsored Benchmarks
No co-evaluations, no pre-test consultations, no favoritism. Whatever the results are, that's what we publish.
The AI world changes daily — you need a reliable source
3 curated picks daily, weekly index changes, instant alerts for incidents and price shifts. Free, no ads, unsubscribe anytime.
- Daily Picks — From the flood of AI news, we pick the 3 that truly matter
- YZ Index Weekly — Who's up, who's down — one email covers it all
- Model Incident Alerts — When a model you use has an issue, know immediately
- Price Change Notifications — API price changes — don't find out from the bill
Free | No Ads | No Sponsored Content | Unsubscribe Anytime
Want deeper analysis? Go further.
The leaderboard answers "who's stronger." Research Lab answers "why." Model safety, edge deployment, performance teardowns — not rehashing papers, but conclusions from our own testing.
Enter Research Lab