Skip to main content
Winzheng
YZ Index News Topics Winzheng Lab WDCD
Subscribe
中文 English 日本語
All Original Global Reviews
All Artificial Intelligence(361) OpenAI(357) Anthropic(278) AI Safety(182) AI Agents(147) AI Ethics(110) Generative AI(96) xAI(91) Google(87) Meta(87) Data Centers(77) WDCD(76) AI(73) AI Regulation(72) Elon Musk(72) Funding(69) Claude(66) AI Chips(63) ChatGPT(62) Cybersecurity(60) Smoke Test(58)

9 Models Tie at 77.5 on Main Leaderboard, Code Execution Full Score but Material Constraint Only 50

The results of the Smoke Lite evaluation on June 5, 2026, show that 9 out of 11 models tied at 77.5 on the main leaderboard, forming a rare tie. Their common feature is that all scored a perfect 100 on the Code Execution dimension, but only 50 on the Material Constraint dimension.

Code Execution Material Constraints Claude Opus 4.7
296 06-05

AI Agents Become a Hot Topic in Tech: The Excitement and Reality Gap Between Multimodality and Enterprise Automation

Discussions around AI agents have surged on X (formerly Twitter), with participants including developers, investors, and corporate executives. While excitement grows over multimodal integration and enterprise automation, concerns about the gap between hype and practical deployment remain significant.

AI Agents 多模态 自动化
181 06-04

Alphabet Raises $85 Billion to Boost AI, Google Business Ushers in a New Capital High

Alphabet recently announced a massive $85 billion financing round to expand its Google AI business, setting a company record and marking one of the largest single AI investments in the global tech industry.

Alphabet Google AI AI funding
192 06-04

Anthropic Files for IPO, Marking AI's Shift from Hype to Enterprise Utility

In 2025, Anthropic filed an IPO prospectus with the SEC, marking the AI industry's transition from early hype to mature commercialization. The move is seen as a bellwether for the sector.

Anthropic IPO enterprise AI
238 06-04

xAI Sued by UK MP Over Grok's Generative Sexualized Images, Sparking AI Content Safety Controversy

A British MP has filed a lawsuit against xAI, alleging that its chatbot Grok generated sexualized images, igniting intense debates about AI ethics, content safety, and regulatory responsibility.

xAI Grok AI Ethics
203 06-04

TSMC CEO Optimistic about AI Chip Demand, Semiconductor Industry Ushers in Strong Growth Cycle

TSMC's CEO publicly stated that AI chip demand remains robust, driving the company's performance growth and boosting the semiconductor sector. However, challenges such as geopolitical factors and intensified competition persist.

TSMC AI Chips 半导体
238 06-04

AI Bubble Fears Echo the Internet Bubble: Microsoft and OpenAI's Circular Revenue Model Sparks Market Debate

A controversial discussion has emerged in the tech community suggesting that the revenue cycle between Microsoft and OpenAI mirrors the business logic of the dot-com bubble era. This has led to notable stock market fluctuations and renewed scrutiny of AI investment sustainability.

AI泡沫 OpenAI 科技泡沫
181 06-04

Cloudflare Report: AI Agent Traffic Surpasses Human Traffic for the First Time, Internet Future May Face Dramatic Changes

Cloudflare's latest report reveals that traffic generated by AI agents has exceeded human user traffic for the first time, marking a significant shift in the internet landscape and sparking widespread discussion in the tech community.

AI agents bot traffic Cloudflare
392 06-04

Smoke Quick Test: 文心一言4.5 and Grok 4 Tie at 99.24, GPT-5.5's Execution Score Only 50

Smoke's quick test results today clearly show that the code execution dimension is nearly saturated. Ten out of eleven models scored 100, while GPT-5.5 dropped to 50, dragging its main leaderboard score down to 59.99.

ERNIE Bot Material Constraints Smoke Test
330 06-04

Deep Dive into Project Solara: Why Microsoft Built a New "Agent-First Device" Platform and Bypassed Windows

Microsoft's Project Solara is a chip-to-cloud device platform designed exclusively for AI agents, not traditional apps. By basing it on AOSP instead of Windows, Microsoft makes a pragmatic bet on a future where interaction shifts from apps to intents.

Microsoft Project Solara agent-first
494 06-03
Research Lab

WDCD Run #146: Average Instruction Decay Hits 24.7% Across 11 Models, Claude Opus 4.7 and GPT-5.5 Tie at Top

WDCD Run #146 (2026-06-03) tested 11 frontier models on multi-turn commitment integrity, recording an average instruction decay of 24.7% from Round 1 to Round 3. Claude Opus 4.7, GPT-5.5, and GPT-o3 tied for first place at 70 points with only -10% decay each.

WDCD AI benchmark instruction decay
286 06-03

Grok 4 Surges 10.8 Points to Dominate, Qwen3 Max Plunges 10.8 Points – Major Shuffle in WDCD Cycle

Run #141 data shows that Grok 4 improved by 10.8 points in a single round, GPT-5.5 improved by 9.2 points, while Qwen3 Max plummeted by 10.8 points. The divergence in adherence capabilities has become clearly visible.

WDCD Compliance Test AI模型评估
446 06-03

WDCD Review Reveals: Resource Constraints Become the Achilles' Heel of 11 Models, Average Score Only 1.7

The most brutal finding of the WDCD compliance test is that resource constraints crippled all models, with an average score of only 1.7 across 11 models, far below the other four scenarios.

WDCD Compliance Test 模型偏科
421 06-03

11 Model WDCD Three-Round Test: R1 95% Commitment, R3 65 Direct Collapses

The core findings of the WDCD three-round test are clear: nearly all models scored high in the constraint establishment phase, but after two rounds of interference, over 60% of models completely abandoned their original commitments under direct pressure.

WDCD Compliance Test 三轮衰减
379 06-03

Three Tied at 70 on WDCD Commitment List, Ernie Bot 4.5 Collapses to 50 at Bottom

The WDCD commitment test reveals model weaknesses through a three-round dialogue design. Only three out of eleven models scored 70, with Ernie Bot 4.5 forming a clear gap at 50.

WDCD Compliance Test AI模型排行
319 06-03

Three Models Tie for Top Spot on Smoke Leaderboard with Perfect Execution Scores but Constraint Warnings

In today's Smoke quick test results, Claude Opus 4.7, Claude Sonnet 4.6, and GPT-5.5 all scored 87.76 on the main leaderboard, tying for first place. The core reason is that all three achieved a perfect 100 in code execution, while scoring 72.8 in the materials constraint dimension, triggering a warn signal.

Claude Opus Material Constraints Smoke 评测
226 06-03

Anthropic Files for IPO, Contrast with OpenAI's Unlisted Status Sparks Market Divergence

Anthropic officially submitted its IPO application on June 1, 2026. The decision highlights valuation uncertainties, profitability comparisons, and a stark contrast with OpenAI, which remains unlisted, fueling debate over market expectations for AI startups.

AI Company IPO Anthropic OpenAI Competition
611 06-02

Senator Warren Calls for Closing AI Chip Export Loophole to China, Trump Policies in Spotlight

Senator Elizabeth Warren has publicly urged the government to close potential loopholes in AI chip exports to China, criticizing possible policy loosening during the Trump era. She calls for stricter controls to prevent Chinese companies from bypassing restrictions through overseas subsidiaries.

AI Chips Export Controls Geopolitics
252 06-02

Jensen Huang's GTC Taipei Speech: Agentic AI Leads the New Era of Physical Intelligence

At the NVIDIA GTC conference in Taipei, CEO Jensen Huang delivered a keynote officially heralding the arrival of the agentic AI era. He highlighted the mass production plan for the Vera Rubin platform, announced a chip designed for AI PCs, and emphasized the convergence of physical AI and robotics.

NVIDIA Jensen Huang Agentic AI
316 06-02

HPE Q2 Results Beat Expectations on AI Data Center Demand Surge, Achieving 2028 Target Two Years Early

HPE's second-quarter earnings exceeded Wall Street expectations, driven by strong demand for AI servers and networking products, allowing the company to reach its 2028 financial goals two years ahead of schedule.

HPE AI Servers Earnings
291 06-02
6 7 8 9 10

© 1998-2026 Winzheng All rights reserved.

Founded in 1998, relaunched in 2025. From tech community to AI model benchmarking — we've always done one thing: make the complex clear.

YZ Index News Winzheng Lab About Us Subscribe Privacy Policy Terms of Service
AI Research: WDCD Dataset Konton Prompt it. Play it. MaxTerm MaxModel CyberFate no LLM judging an LLM

This benchmark operates independently and accepts no sponsorship from AI model vendors. Every score in the YZ Index is produced by automated evaluation.

Citation format: YZ Index (2026). AI Model Comprehensive Rankings. https://www.winzheng.com/yz-index/

Data License: CC BY-NC 4.0