Winzheng — AI Model Benchmarking · Change Intelligence · Selection Guide

Five things you need to know about AI

At SXSW London last week I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I

2026-06-09 18:00

The Download: whole-body rejuvenation drugs and five things to know about AI

This is today’s edition of The Download, our weekday newsletter that provi

Lovable says it has hit $500M in annualized revenue, with 1 million new projects a week

Lovable says it has now surpassed $500 million in annualized run-rate revenue an

Overall Top 5

#1 Grok 4 89.9 ▲11.5 · #2 Claude Opus 4.7 89 ▲10.2 · #3 豆包 Pro 88.8 ▲10 · #4 Claude Sonnet 4.6 87.2 ▲9.2 · #5 Gemini 2.5 Pro 86.4 ▲7.4 · #6 Qwen3 Max 86.2 ▲8.5 · #7 Gemini 3.1 Pro 84.8 ▲7.7 · #8 DeepSeek V4 Pro 83.3 ▲6.4 · #9 GPT-o3 82.8 ▲6.9 · #10 GPT-5.5 80.9 ▲2.7 · #11 文心一言 4.5 76.9 ▲15.2 · ▲ Qwen3 Max +80.9 · ▼ DeepSeek V3 -75.1 · #1 Grok 4 89.9 ▲11.5 · #2 Claude Opus 4.7 89 ▲10.2 · #3 豆包 Pro 88.8 ▲10 · #4 Claude Sonnet 4.6 87.2 ▲9.2 · #5 Gemini 2.5 Pro 86.4 ▲7.4 · #6 Qwen3 Max 86.2 ▲8.5 · #7 Gemini 3.1 Pro 84.8 ▲7.7 · #8 DeepSeek V4 Pro 83.3 ▲6.4 · #9 GPT-o3 82.8 ▲6.9 · #10 GPT-5.5 80.9 ▲2.7 · #11 文心一言 4.5 76.9 ▲15.2 · ▲ Qwen3 Max +80.9 · ▼ DeepSeek V3 -75.1 ·

Full Rankings →

Latest News

View All News →

News 06-09 22:04 TC

How an e-scooter founder raised $5 million to build space data centers

Orbital founder Euwyn Poon built 250,000 scooters at Spin. Now he wants to launch 10,000 space data centers.

News 06-09 22:03 MIT

The Download: whole-body rejuvenation drugs and five things to know about AI

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going o

News 06-09 22:02 TC

Lovable says it has hit $500M in annualized revenue, with 1 million new projects a week

Lovable says it has now surpassed $500 million in annualized run-rate revenue and its users are building businesses and

News 06-09 22:01 ARS

Apple says its AI is still private, even when it's running on Google's servers

Some models run in Google's cloud, but without giving Google any kind of access.

News 06-09 22:00 TC

Sandstone raises $30M to bring AI to in-house legal teams

Sandstone's Series A was led by Lightspeed Partners, with participation from Sequoia.

News 06-09 20:02 MIT

David Sinclair plans to test whole-body rejuvenation drugs in the XPrize competition

The outspoken longevity scientist David Sinclair has been predicting that one day, you’ll go to the doctor and get a pre

News 06-09 20:01 MIT

Learning to lead in a hybrid human-AI enterprise

As adoption of AI agents looks set to surge by as much as 300% in the next two years, leadership teams are carefully con

News 06-09 20:00 WD

Alex Vindman Survived Trump’s Retaliation Machine. Now He’s Running for Senate

In 2019, Alex Vindman testified during President Trump’s first impeachment trial–a decision that ended his military care

News 06-09 12:00 TC

Mercor’s Brendan Foody calls out Sequoia, accusing it of ‘dual-pricing’ valuation tricks

Sequoia is just one of the top firms that sells same equity at two different prices.

News 06-09 10:00 TC

Why Apple’s slow-and-steady AI bet is starting to look pretty smart

Can Apple's new AI glow up put to bed accusations that it's losing an all-important industry race?

News 06-09 08:01 TC

Apple’s WWDC AI demos looked more real after $250M false ad settlement

The vibe of Apple's 2026 WWDC keynote felt like a spouse proudly listing all the honey-do-list items tackled. One subtle

News 06-09 08:01 TC

As OpenAI files for IPO, Sam Altman’s eye-scanning company is doing layoffs, report says

Tools for Humanity, Sam Altman's identify verification company, is reportedly struggling to generate revenue and will do

Reviews

View All →

Smoke Daily: GPT-5.5 tops with 92.58 points, material constraint gap of 19 points decides the outcome

Smoke's latest data shows that code execution is no longer the dividing line, and material constraints have become the r

11 Models Answer Same Blame-Shifting Problem: 8 Get A>B>D>C, 3 Get 0 Points Directly

11 mainstream models showed significant divergence on the same engineering judgment question: 8 models output A>B>D>C an

Binary Tree Serialization Test: 11 Models, 7 Full Scores, 4 Directly Zero

In a strict binary tree serialization test requiring only code output, explicit null node markers, and stable results, 7

WDCD Compliance

#1 Claude Opus 4.7 70 #2 GPT-5.5 70 #3 GPT-o3 70 #4 Claude Sonnet 4.6 67.5 #5 Gemini 2.5 Pro 67.5 #6 豆包 Pro 62.5 #7 Gemini 3.1 Pro 62.5

View full compliance rankings →

Research Lab

3 Major Models Translation Showdown: Week 24 Quality Evaluation, passthrough Leads with a Score of 9

This week, <strong>2425</strong> translation tasks were completed by <strong>3</strong> models. <str

WDCD Run #146: Average Instruction Decay Hits 24.7% Across 11 Models, Claude Opus 4.7 and GPT-5.5 Tie at Top

WDCD Run #146 (2026-06-03) tested 11 frontier models on multi-turn commitment integrity, recording a

3 Major Model Translation Showdown: Week 23 Quality Evaluation, gpt-o3 Leads with a Score of 9

This week, 270 translation tasks were completed by 3 models. Two samples were selected for multi-mod

Enter Research Lab →