YZ Index — AI Model Benchmarks, News & Research

Latest News

View All News →
News 06-10 10:01 TC
How Justin Ernest invested nearly $500M into hot startups without a traditional VC fund
Instead of spending a year raising a formal venture fund, the Sabertooth VC founder used a captive network of LPs to inv
News 06-10 10:00 TC
Google just fired a warning shot in the AI subscription price wars
Google just made it significantly cheaper to enjoy its budget AI subscription tier.
News 06-10 08:01 TC
How Justin Ernest invested nearly $400M into hot startups without a traditional VC fund
Instead of spending a year raising a formal venture fund, the Sabertooth VC founder used a captive network of LPs to inv
News 06-10 06:01 TC
Anthropic’s Fable 5 can make weirdly fun video games with the click of a button
Anthropic's Claude Fable 5 is going to be a big hit with the web's vibe coders.
News 06-10 06:00 TC
Hey Siri, here’s what I actually want from AI
I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't function without
News 06-10 05:01 Winzheng Lab
WDCD Run #157: Average Instruction Decay Hits 47.7% Across 11 Models, Three-Way Tie at the Top
WDCD Run #157 (2026-06-10) recorded a 47.7% average commitment decay across 11 models, with Claude Sonnet 4.6, Gemini 2.
Review 06-10 05:01
WDCD Compliance Test Shakes: 5 Models Plunge Up to 12.5 Points, Qwen3 Max Rallies
In the latest WDCD cycle compared to Run #146, five mainstream models experienced significant declines, with a maximum d
Review 06-10 05:01
11 Models WDCD Horizontal Review: Resource Constraints All Collapse to 1 Point, Business Rules Show 4-Point Gap
WDCD pilot data shows that the Resource Constraints scenario scored the lowest overall, with champion gemini-3.1-pro onl
Review 06-10 05:00
R3 Integrity Rate Plunges to 24.5%, 72 Crashes Reveal True Colors of 11 Models
The WDCD test's most striking finding is that while models perform well in R1 and R2 stages, their overall integrity rat
Review 06-10 05:00
67.5 Points Three-Way Tie for First, Grok4 Only 50 Points at Bottom - WDCD Compliance Leaderboard
The first results of the WDCD Compliance Test are out, with three models tied for first at 67.50 points, while Grok 4 an
News 06-10 04:03 TC
WWDC 2026: Everything announced on Siri AI, iOS 27, Apple Intelligence, and more
Apple primarily made the case for an improved experience with its long-standing Siri assistant, which like most other an
News 06-10 04:02 TC
Can tech companies learn to love cheaper AI models? 
If those same AI workloads can be handled by cheaper models without affecting quality, it would mean a massive shift in