Winzheng — AI Model Benchmarking · Change Intelligence

Overall Top 5

#1 Claude Opus 4.7 88.2 ▼1.1 · #2 DeepSeek V4 Pro 81.7 ▼7.1 · #3 Grok 4 80.4 ▼4.6 · #4 Claude Sonnet 4.6 79.5 ▼3.5 · #5 GPT-5.5 79.1 ▲6.2 · #6 GPT-o3 78.9 ▼4.5 · #7 Gemini 2.5 Pro 75.8 ▼2.1 · #8 豆包 Pro 74.3 ▼7.3 · #9 Qwen3 Max 72.1 ▼8.9 · #10 Gemini 3.1 Pro 69.3 ▼13.2 · #11 GLM-4.6 63.4 · ▲ Gemini 3.1 Pro +8.8 · ▼ GPT-5.5 -30.2 · #1 Claude Opus 4.7 88.2 ▼1.1 · #2 DeepSeek V4 Pro 81.7 ▼7.1 · #3 Grok 4 80.4 ▼4.6 · #4 Claude Sonnet 4.6 79.5 ▼3.5 · #5 GPT-5.5 79.1 ▲6.2 · #6 GPT-o3 78.9 ▼4.5 · #7 Gemini 2.5 Pro 75.8 ▼2.1 · #8 豆包 Pro 74.3 ▼7.3 · #9 Qwen3 Max 72.1 ▼8.9 · #10 Gemini 3.1 Pro 69.3 ▼13.2 · #11 GLM-4.6 63.4 · ▲ Gemini 3.1 Pro +8.8 · ▼ GPT-5.5 -30.2 ·

Full Rankings →

YZ Index · Weekly real-sandbox evaluation of 11 mainstream models · Zero vendor sponsorship · Auditable scoring Methodology →

Latest News

View All News →

News 07-11 06:23 TC

Apple sues OpenAI over alleged trade secret theft

Apple alleges the misconduct was directed by OpenAi's senior leadership, including a long-time former employee.

News 07-11 06:10 NF

Apple Sues OpenAI for Stealing AI Hardware Trade Secrets; 400 Former Employees Involved

Apple has filed a lawsuit against OpenAI and two former executives, alleging the systematic theft of trade secrets relat

News 07-11 05:20 X

SK Hynix’s $26.5 Billion IPO: Capital and Factory Construction Game Driven by AI Memory Demand

SK Hynix completed the largest foreign IPO in the U.S. capital market at $26.5 billion, with proceeds primarily allocate

News 07-11 05:09 X

Meta Muse Image Generator Sparks Strong Backlash for Training on User Photos Without Consent

Meta's Muse image generator is facing user backlash over allegations that it trained on user photos without consent, rai

News 07-11 05:08 X

xAI Grok 4.5 Publicly Launched, Claims Opus-Class Performance

On July 9, xAI opened Grok 4.5 to the public, claiming Opus-class performance. This marked the first time since the Fabl

News 07-11 05:08 X

OpenAI GPT-5.6 Sol, Terra, and Luna Models Launched Publicly on the Same Day

On July 9, 2026, OpenAI officially released GPT-5.6 Sol, Terra, and Luna to all ChatGPT users and API developers, follow

News 07-11 04:24 TC

Hugging Face’s CEO on why companies are done renting their AI

Open source AI is booming, according to Hugging Face CEO Clem Delangue. The company has grown into someth

News 07-11 04:23 TC

Open source AI matters more than ever, according to Hugging Face’s Clem Delangue

Open source AI is booming, according to Hugging Face CEO Clem Delangue. The company has grown into someth

Review 07-11 03:35

Claude Opus 4.7, Claude Sonnet 4.6, and GPT-o3 Tie at 81.44 Points: 2026-07-11 Smoke Quick Test Data Brief

On 2026-07-11, the YZ Index Smoke quick test covered 11 models, with Claude Opus 4.7, Claude Sonnet 4.6, and GPT-o3 tyin

News 07-11 02:56 TC

SK Hynix raises $26.5B in the biggest foreign IPO in US history, is urged to build new US fabs

The AI chip boom just produced its biggest Wall Street moment yet. Now SK Hynix and Samsung are being asked to build U.S

News 07-11 02:56 ARS

Disable auto-play and infinite scroll or risk massive fines, EU tells Meta

Digital Services Act may force Meta to make big changes on its platforms.

News 07-10 20:21 NF

Meta Muse Image Generator Defaults to Including Public Accounts, Privacy Backlash Sparks User Control Controversy

On July 9, 2026, Meta launched the Muse Image Instagram generator, which by default includes public Instagram and Facebo

Reviews

View All →

Review 07-11

Claude Opus 4.7, Claude Sonnet 4.6, and GPT-o3 Tie at 81.44 Points: 2026-07-11 Smoke Quick Test Data Brief

On 2026-07-11, the YZ Index Smoke quick test covered 11 models, with Claude Opus 4.7, Claude Sonnet 4.6, and GPT-o3 tyin

Review 07-10

Claude Opus 4.7 Smoke Review Main Board Plunges 19.3 Points, Code Execution Drops 22 Points in a Single Day

In today's Smoke review, Claude Opus 4.7's main board score fell from 90.51 to 71.26, a drop of 19.3 points. Code execut

Review 07-10

Grok 4 Main Score Plunges 8.4 Points, Material Constraint Drops 17.6 Points in a Single Day

Grok 4's main score in today's Smoke evaluation dropped 8.4 points from 87.66 to 79.30, with the Material Constraint dim

WDCD Compliance

What it tests: whether AI holds your original instructions across multi-turn dialogue

#1 Grok 4 95 #2 DeepSeek V4 Pro 94 #3 GLM-4.6 93.6 #4 GPT-o3 89.8 #5 Claude Opus 4.7 82.5 #6 Gemini 3.1 Pro 80.8 #7 Gemini 2.5 Pro 76.8

View full compliance rankings →

Research Lab

WDCD Run #221: Average Instruction Decay Hits -36.4% as Grok 4 Leads 11-Model Field

WDCD Run #221 (2026-07-08) measured instruction decay across 11 frontier models over three dialogue

4 Major Model Translation Showdown: Week 28 Quality Evaluation, gpt-o3 Leads with Score of 9

This week, 318 translation tasks were completed by 4 models. A blind evaluation of 3 sampled documen

WDCD Run #211: Grok 4 Leads with Just -13% Instruction Decay as GPT-o3 Collapses at -75%

WDCD Run #211 (2026-07-03) benchmarked 11 models on multi-turn commitment integrity, with Grok 4 tak

Enter Research Lab →

YZ Index — AI Model Benchmarks, News & Research

Latest News

Reviews

WDCD Compliance

Research Lab