YZ Index — AI Model Benchmarks, News & Research

Latest News

View All News →
News 06-10 06:01 TC
Anthropic’s Fable 5 can make weirdly fun video games with the click of a button
Anthropic's Claude Fable 5 is going to be a big hit with the web's vibe coders.
News 06-10 06:00 TC
Hey Siri, here’s what I actually want from AI
I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't function without
News 06-10 05:01 Winzheng Lab
WDCD Run #157: Average Instruction Decay Hits 47.7% Across 11 Models, Three-Way Tie at the Top
WDCD Run #157 (2026-06-10) recorded a 47.7% average commitment decay across 11 models, with Claude Sonnet 4.6, Gemini 2.
Review 06-10 05:01
WDCD Compliance Test Shakes: 5 Models Plunge Up to 12.5 Points, Qwen3 Max Rallies
In the latest WDCD cycle compared to Run #146, five mainstream models experienced significant declines, with a maximum d
Review 06-10 05:01
11 Models WDCD Horizontal Review: Resource Constraints All Collapse to 1 Point, Business Rules Show 4-Point Gap
WDCD pilot data shows that the Resource Constraints scenario scored the lowest overall, with champion gemini-3.1-pro onl
Review 06-10 05:00
R3 Integrity Rate Plunges to 24.5%, 72 Crashes Reveal True Colors of 11 Models
The WDCD test's most striking finding is that while models perform well in R1 and R2 stages, their overall integrity rat
Review 06-10 05:00
67.5 Points Three-Way Tie for First, Grok4 Only 50 Points at Bottom - WDCD Compliance Leaderboard
The first results of the WDCD Compliance Test are out, with three models tied for first at 67.50 points, while Grok 4 an
News 06-10 04:03 TC
WWDC 2026: Everything announced on Siri AI, iOS 27, Apple Intelligence, and more
Apple primarily made the case for an improved experience with its long-standing Siri assistant, which like most other an
News 06-10 04:02 TC
Can tech companies learn to love cheaper AI models? 
If those same AI workloads can be handled by cheaper models without affecting quality, it would mean a massive shift in
News 06-10 04:01 ARS
Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation
Voice translations preserve speaker's tone, pacing, pitch—with SynthID watermarks for security.
News 06-10 04:00 ARS
Anthropic says these topics are too dangerous to let its Fable 5 model talk about
New frontier model refuses cybersecurity, biology, and chemistry queries.
Review 06-10 03:10
Claude Sonnet 4.6 Leads with 97.53 Points, Material Constraints Drag 文心一言 40 Points Behind
Smoke's quick test today directly concludes that code execution has become the passing line, while material constraints