Winzheng — AI Model Benchmarking · Change Intelligence · Selection Guide

Five things you need to know about AI

At SXSW London last week I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I

2026-06-09 18:00

It’s not FAANG anymore. It’s MANGOS.

With SpaceX, Anthropic, and OpenAI all eyeing massive public debuts, the tech in

Anthropic’s Claude Fable 5 is a version of Mythos the public can access today

Anthropic is releasing Claude Fable 5, its first Mythos-class model available to

Overall Top 5

#1 Grok 4 89.9 ▲11.5 · #2 Claude Opus 4.7 89 ▲10.2 · #3 豆包 Pro 88.8 ▲10 · #4 Claude Sonnet 4.6 87.2 ▲9.2 · #5 Gemini 2.5 Pro 86.4 ▲7.4 · #6 Qwen3 Max 86.2 ▲8.5 · #7 Gemini 3.1 Pro 84.8 ▲7.7 · #8 DeepSeek V4 Pro 83.3 ▲6.4 · #9 GPT-o3 82.8 ▲6.9 · #10 GPT-5.5 80.9 ▲2.7 · #11 文心一言 4.5 76.9 ▲15.2 · ▲ Qwen3 Max +80.9 · ▼ DeepSeek V3 -75.1 · #1 Grok 4 89.9 ▲11.5 · #2 Claude Opus 4.7 89 ▲10.2 · #3 豆包 Pro 88.8 ▲10 · #4 Claude Sonnet 4.6 87.2 ▲9.2 · #5 Gemini 2.5 Pro 86.4 ▲7.4 · #6 Qwen3 Max 86.2 ▲8.5 · #7 Gemini 3.1 Pro 84.8 ▲7.7 · #8 DeepSeek V4 Pro 83.3 ▲6.4 · #9 GPT-o3 82.8 ▲6.9 · #10 GPT-5.5 80.9 ▲2.7 · #11 文心一言 4.5 76.9 ▲15.2 · ▲ Qwen3 Max +80.9 · ▼ DeepSeek V3 -75.1 ·

Full Rankings →

Latest News

View All News →

News 06-10 04:00 ARS

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

New frontier model refuses cybersecurity, biology, and chemistry queries.

News 06-10 02:02 TC

It’s not FAANG anymore. It’s MANGOS.

With SpaceX, Anthropic, and OpenAI all eyeing massive public debuts, the tech industry may soon have a new class of corp

News 06-10 02:01 TC

Anthropic’s Claude Fable 5 is a version of Mythos the public can access today

Anthropic is releasing Claude Fable 5, its first Mythos-class model available to the public. The model comes with guardr

News 06-10 02:00 WD

Anthropic Offers Mythos Upgrade for Cyber Partners and a ‘Safe’ Version for the Rest of You

Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’

News 06-10 01:39 X

Apple WWDC 2026: Gemini-Powered Siri Debuts, On-Device AI Reshapes Intelligent Ecosystem

At WWDC 2026, Apple announced Gemini-powered Siri and a multi-model Apple Intelligence architecture, marking a major bre

News 06-10 01:39 X

OpenAI Secretly Files IPO, AI Giant's Listing Wave Sparks Market Controversy

OpenAI has quietly submitted an IPO filing to the SEC, signaling accelerated commercialization, while its affiliated com

News 06-10 01:39 X

NVIDIA and Hyundai Deepen AI Collaboration, Accelerating Commercialization of Embodied Intelligent Robots

NVIDIA CEO Jensen Huang recently met with Hyundai Motor Group executives to deepen cooperation in AI applications across

News 06-10 01:38 X

Moonshot AI Launches $2 Billion Funding Round, Valuation Eyes $30 Billion

Chinese AI startup Moonshot AI has announced a new funding round targeting $2 billion, which would boost its valuation t

News 06-10 01:38 X

Anthropic Launches Claude Fable 5, Performance Greatly Improved Based on Mythos Architecture

Anthropic recently unveiled the new Claude Fable 5 model, built on the Mythos underlying architecture, marking another m

News 06-09 22:04 TC

How an e-scooter founder raised $5 million to build space data centers

Orbital founder Euwyn Poon built 250,000 scooters at Spin. Now he wants to launch 10,000 space data centers.

News 06-09 22:03 MIT

The Download: whole-body rejuvenation drugs and five things to know about AI

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going o

News 06-09 22:02 TC

Lovable says it has hit $500M in annualized revenue, with 1 million new projects a week

Lovable says it has now surpassed $500 million in annualized run-rate revenue and its users are building businesses and

Reviews

View All →

Smoke Daily: GPT-5.5 tops with 92.58 points, material constraint gap of 19 points decides the outcome

Smoke's latest data shows that code execution is no longer the dividing line, and material constraints have become the r

11 Models Answer Same Blame-Shifting Problem: 8 Get A>B>D>C, 3 Get 0 Points Directly

11 mainstream models showed significant divergence on the same engineering judgment question: 8 models output A>B>D>C an

Binary Tree Serialization Test: 11 Models, 7 Full Scores, 4 Directly Zero

In a strict binary tree serialization test requiring only code output, explicit null node markers, and stable results, 7

WDCD Compliance

#1 Claude Opus 4.7 70 #2 GPT-5.5 70 #3 GPT-o3 70 #4 Claude Sonnet 4.6 67.5 #5 Gemini 2.5 Pro 67.5 #6 豆包 Pro 62.5 #7 Gemini 3.1 Pro 62.5

View full compliance rankings →

Research Lab

3 Major Models Translation Showdown: Week 24 Quality Evaluation, passthrough Leads with a Score of 9

This week, <strong>2425</strong> translation tasks were completed by <strong>3</strong> models. <str

WDCD Run #146: Average Instruction Decay Hits 24.7% Across 11 Models, Claude Opus 4.7 and GPT-5.5 Tie at Top

WDCD Run #146 (2026-06-03) tested 11 frontier models on multi-turn commitment integrity, recording a

3 Major Model Translation Showdown: Week 23 Quality Evaluation, gpt-o3 Leads with a Score of 9

This week, 270 translation tasks were completed by 3 models. Two samples were selected for multi-mod

Enter Research Lab →