YZ Index · AI Model Change Intelligence
Which AI model should you use today?
We benchmark them every week.
11 models · 212 questions randomly sampled · Real code execution · Citation verification · Rolling average rankings · Don't trust press releases, check continuous performance.
Code Sandbox Execution
Citation Accuracy Check
Statistical Significance Ranking
Compliance Testing
No Vendor Sponsorship
Who to Use Right Now
#1 Overall (Rolling Average)
Claude Sonnet 4.6
Biggest Rise This Week
Qwen3 Max +68.5
Biggest Drop
DeepSeek V3 -75.1
Latest Benchmark
2026-05-18 SGT
judge
v6
0
Models Tested
0
Test Questions
0
DCD Scenarios
5 categories x 6 questions
Weekly
Auto-evaluation frequency
Overall Top 5Rolling average
Full RankingsQuick Scene Lookup
Recommend by ScenarioWeekly Signals
Changes Report▲ Biggest Gain
Qwen3 Max
+68.5
▼ Biggest Drop
DeepSeek V3
-75.1
Incidents / Pricing
0 incidents
11 price changes
Don't just look at the overall score — consider your use case
Top Pick
豆包 Pro
89.8 pts
Runner-up
Grok 4
86.8 pts
Third Choice
Claude Sonnet 4.6
86.8 pts
Top Pick
Claude Opus 4.7
55.8 pts
Runner-up
Claude Sonnet 4.6
52.9 pts
Third Choice
Gemini 3.1 Pro
48.8 pts
Top Pick
Claude Sonnet 4.6
78.4 pts
Runner-up
Claude Opus 4.7
75.2 pts
Third Choice
Grok 4
73.9 pts
Top Pick
deepseek-v3
99.7 pts
Runner-up
ernie-4
98.5 pts
Third Choice
文心一言 4.5
98.3 pts
Top Pick
豆包 Pro
38.9 pts
Runner-up
Gemini 3.1 Pro
38.2 pts
Third Choice
Claude Sonnet 4.6
38 pts
Top Pick
claude-opus-4.6
0 pts
Runner-up
Claude Opus 4.7
0 pts
Third Choice
Claude Sonnet 4.6
0 pts
Claude Opus 4.7
65 pts
Claude Sonnet 4.6
62.5 pts
豆包 Pro
60 pts
Worth reading today — beyond the hype
We only feature content that impacts capability, pricing, stability, or model selection.
News
AI造伪引文入书,作者为何坚持使用?
作家Steven Rosenbaum的新书《真理的未来》中包含多条由AI生成的“合成引用”,这些引文看似真实实则虚构。尽管发现错误,Rosenbaum仍表示会继续使用AI辅助写作。这一事件揭示了生成式AI在创作中的可靠性危机,以及人类作者面对技术诱惑时的矛盾心态。本文深度分析AI虚假引用背后的行业困境与伦理边界。
News
就算你讨厌AI,也逃不过谷歌AI搜索
谷歌将AI融入搜索,提供量身定制的答案,极大提升便利性,但同时也让用户远离原始内容源。WIRED资深作者Steven Levy指出,这种看似无缝的体验正在掏空网络内容生态,损害创作者利益。尽管用户可能厌恶AI,但无法抗拒其高效,最终成为AI搜索的俘虏。
News
谷歌AI眼镜上手体验:离完美只差一步
TechCrunch记者体验了谷歌最新原型Android XR眼镜。这款设备由Gemini驱动,能将实时翻译、导航和信息提示直接叠加在用户视野中。它轻便、自然,交互流畅,展现了增强现实在日常场景中的巨大潜力。但仍有续航、视野宽度和内容生态等短板。谷歌似乎找到了正确方向,但距离消费级成熟产品还需要时间打磨。
News
编程的未来已来:Anthropic用Claude展示AI编码新范式
在Anthropic于伦敦举办的开发者活动“Code with Claude”上,公司展示了AI辅助编程的最新成果。与会者被问及是否曾用AI生成代码——这一问题的答案揭示了一个不可逆转的趋势:无论我们是否愿意,AI正在重塑软件开发的基础。本文深入分析Claude的编码能力、行业影响以及背后的技术挑战。
News
中国AI绘制全国可再生能源电网,引世界关注
在全球AI耗电量激增、电网承压的背景下,中国成功利用AI技术绘制了全国可再生能源电网地图,实现清洁能源的智能调度与预测。这一突破不仅缓解了AI算力对电网的冲击,更为全球能源转型提供了中国方案。美国PJM电网容量电价已暴涨十倍,而中国的AI能源管理正成为破解矛盾的关键。
News
OpenAI新加坡AI实验室落成,IMDA同步更新AI框架
OpenAI宣布将在新加坡设立其首个美国以外的应用AI实验室,作为与新加坡数字发展及信息部(MDDI)合作的一部分。该计划名为“OpenAI for Singapore”,在ATx峰会上公布,承诺投入超过3亿新元。实验室将专注于应用AI研究,同时新加坡资讯通信媒体发展局(IMDA)更新了国家AI治理框架,以加速AI安全部署,为全球AI治理树立新标杆。
News
谷歌I/O:AI驱动科学的路径正经历变革
在2026年Google I/O主题演讲中,DeepMind CEO Demis Hassabis宣称我们正“站在奇点的山麓”——这一论断引发热议。本文深度解析Google在AI for Science领域的最新动向,从AlphaFold的最新进展到材料科学、药物研发的新突破,探讨AI如何重塑科学研究范式,并分析其中蕴含的机遇与挑战。
News
马斯克与扎克伯格联手说服特朗普废除AI政令
原定签署的AI行政命令在最后时刻被美国总统特朗普取消,理由是避免削弱美国对华竞争优势。据知情人士透露,科技巨头马斯克和扎克伯格在幕后积极游说,认为过度监管将阻碍创新。这一事件凸显了科技巨头对美国AI政策的深度影响力,也引发了对中美AI竞争格局的新一轮讨论。
News
海湾AI热潮遭遇海底电缆瓶颈
随着中东超级计算中心加速落地,AI对带宽的渴求让海底电缆中断的风险急剧放大。海湾国家正面临一个矛盾:一边是飙升的数据需求,一边是老旧且脆弱的全球海底光缆网络。从船只抛锚到地缘政治摩擦,任何一次断裂都可能导致AI训练中断数周。本文剖析电缆危机如何倒逼海湾重新设计互联网基础设施。
Not all AI news is worth reading. What matters is what changes your judgment. View All News
Why This Leaderboard Is Worth Your Attention
0
Models Tested
Fully transparent
0
Open Questions
Random sampling
30
Compliance Scenarios
Zero AI judging
1998
Founded
Continuously operating
0
Vendor Sponsors
Fully independent
Real Code Execution
Looking like it can code isn't enough. We run the code in a sandbox. If it doesn't pass, it's zero.
Citation Verification
For long-document questions, we don't just check if the answer looks right — we verify citations trace back to the source.
Statistical Rankings
We don't judge on a single run. Rankings are based on rolling averages, avoiding luck-driven fluctuations.
No Sponsored Benchmarks
No co-evaluations, no pre-test consultations, no favoritism. Whatever the results are, that's what we publish.
The AI world changes daily — you need a reliable source
3 curated picks daily, weekly index changes, instant alerts for incidents and price shifts. Free, no ads, unsubscribe anytime.
- Daily Picks — From the flood of AI news, we pick the 3 that truly matter
- YZ Index Weekly — Who's up, who's down — one email covers it all
- Model Incident Alerts — When a model you use has an issue, know immediately
- Price Change Notifications — API price changes — don't find out from the bill
Free | No Ads | No Sponsored Content | Unsubscribe Anytime
Want deeper analysis? Go further.
The leaderboard answers "who's stronger." Research Lab answers "why." Model safety, edge deployment, performance teardowns — not rehashing papers, but conclusions from our own testing.
Enter Research Lab