Winzheng — AI Model Benchmarking · Change Intelligence

The “steroid olympics” were a circus—and a window into our culture

Testosterone. Methenolone. Nandrolone. Human growth hormone and EPO. Meldonium, modafinil, and mixed amphetamine salts. Clomiphene, anastrozole, levothyroxine,

2026-06-10 18:00

Warner Music acquires AI attribution startup Sureel AI

Through the acquisition, WMG aims to better track when its artists' work is used

The three hard-tech moonshots fueling SpaceX’s unbelievable IPO

Most of the value in SpaceX's IPO is effectively a call option on the company's

Overall Top 5

#1 Grok 4 89.9 ▲11.5 · #2 Claude Opus 4.7 89 ▲10.2 · #3 豆包 Pro 88.8 ▲10 · #4 Claude Sonnet 4.6 87.2 ▲9.2 · #5 Gemini 2.5 Pro 86.4 ▲7.4 · #6 Qwen3 Max 86.2 ▲8.5 · #7 Gemini 3.1 Pro 84.8 ▲7.7 · #8 DeepSeek V4 Pro 83.3 ▲6.4 · #9 GPT-o3 82.8 ▲6.9 · #10 GPT-5.5 80.9 ▲2.7 · #11 文心一言 4.5 76.9 ▲15.2 · ▲ Qwen3 Max +80.9 · ▼ DeepSeek V3 -75.1 · #1 Grok 4 89.9 ▲11.5 · #2 Claude Opus 4.7 89 ▲10.2 · #3 豆包 Pro 88.8 ▲10 · #4 Claude Sonnet 4.6 87.2 ▲9.2 · #5 Gemini 2.5 Pro 86.4 ▲7.4 · #6 Qwen3 Max 86.2 ▲8.5 · #7 Gemini 3.1 Pro 84.8 ▲7.7 · #8 DeepSeek V4 Pro 83.3 ▲6.4 · #9 GPT-o3 82.8 ▲6.9 · #10 GPT-5.5 80.9 ▲2.7 · #11 文心一言 4.5 76.9 ▲15.2 · ▲ Qwen3 Max +80.9 · ▼ DeepSeek V3 -75.1 ·

Full Rankings →

Latest News

View All News →

News 06-11 00:06 WD

Wrongful Arrest Exposes Failures in One of the Oldest Police Face-Recognition Tools in the US

The ACLU is suing two Florida police departments over the arrest of a Fort Myers man in a child-abduction case, saying o

News 06-11 00:05 TC

Warner Music acquires AI attribution startup Sureel AI

Through the acquisition, WMG aims to better track when its artists' work is used in AI-generated content or for training

News 06-11 00:04 TC

The three hard-tech moonshots fueling SpaceX’s unbelievable IPO

Most of the value in SpaceX's IPO is effectively a call option on the company's ambitious space data center plans.

News 06-11 00:02 TC

Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in

AI coding agent startup Niteshift has raised a $7 million seed round from a who's who of angels. It's betting companies

News 06-11 00:01 TC

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Cybersecurity researchers are complaining that Anthropic's new model Fable has guardrails that are too strict for any cy

News 06-10 22:03 MIT

The Download: the “steroid olympics” and a safer Mythos

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going o

News 06-10 22:02 TC

Decart’s new world model can simulate hours of photorealistic driving — with some caveats

Decart is launching Oasis 3, a real-time world model that generates photorealistic driving environments for autonomous v

News 06-10 22:01 TC

Jedify raises $24M to help companies arm AI agents with context on their business

The funding round was led by Norwest, with participation S Capital VC, Cerca Partners, and Oceans Ventures. Snowflake Ve

News 06-10 22:00 WD

China Opens World’s First Wind-Powered Underwater Data Center

With an initial capacity of 24 megawatts, the innovative data center uses seawater as a natural cooling system.

News 06-10 20:00 WD

Artificial Intelligence Sneaks Into the World Cup Thanks to Google Gemini

The Argentine national team will be Google’s test bench and technological showcase during the World Cup.

News 06-10 16:00 TC

Meta signs first AI data center deal in India with Reliance

The 168-megawatt facility will support Meta's global AI computing needs and can be expanded over time.

News 06-10 10:01 TC

How Justin Ernest invested nearly $500M into hot startups without a traditional VC fund

Instead of spending a year raising a formal venture fund, the Sabertooth VC founder used a captive network of LPs to inv

Reviews

View All →

Review 06-10

WDCD Compliance Test Shakes: 5 Models Plunge Up to 12.5 Points, Qwen3 Max Rallies

In the latest WDCD cycle compared to Run #146, five mainstream models experienced significant declines, with a maximum d

Review 06-10

11 Models WDCD Horizontal Review: Resource Constraints All Collapse to 1 Point, Business Rules Show 4-Point Gap

WDCD pilot data shows that the Resource Constraints scenario scored the lowest overall, with champion gemini-3.1-pro onl

Review 06-10

R3 Integrity Rate Plunges to 24.5%, 72 Crashes Reveal True Colors of 11 Models

The WDCD test's most striking finding is that while models perform well in R1 and R2 stages, their overall integrity rat

WDCD Compliance

#1 Claude Sonnet 4.6 67.5 #2 Gemini 2.5 Pro 67.5 #3 Qwen3 Max 67.5 #4 GPT-o3 65 #5 Claude Opus 4.7 62.5 #6 Gemini 3.1 Pro 60 #7 GPT-5.5 57.5

View full compliance rankings →

Research Lab

WDCD Run #157: Average Instruction Decay Hits 47.7% Across 11 Models, Three-Way Tie at the Top

WDCD Run #157 (2026-06-10) recorded a 47.7% average commitment decay across 11 models, with Claude S

3 Major Models Translation Showdown: Week 24 Quality Evaluation, passthrough Leads with a Score of 9

This week, <strong>2425</strong> translation tasks were completed by <strong>3</strong> models. <str

WDCD Run #146: Average Instruction Decay Hits 24.7% Across 11 Models, Claude Opus 4.7 and GPT-5.5 Tie at Top

WDCD Run #146 (2026-06-03) tested 11 frontier models on multi-turn commitment integrity, recording a

Enter Research Lab →

YZ Index — AI Model Benchmarks, News & Research

Latest News

Reviews

WDCD Compliance

Research Lab