R1 Answers Well, R3 Completely Collapses: 63% Defeat Rate Revealed in Commitment Decay Test of 11 Models

The WDCD three-round decay test reveals a sobering reality for technical decision-makers: the R1 confirmation rate is 95%, the R2 resistance rate is 91%, but the R3 integrity rate plummets to 29%. Out of 330 R3 pressure tests, 209 ended in complete collapse (0 points), a breakdown rate of 63.3%. Models that confidently promise constraints in the first round betray them on the spot over 60% of the time when directly pressured in the third round.

WDCD 守约测试 模型衰减
202

AI Suppliers Hard to Tell Apart: WDCD Guardrail Test Exposes Scores of 11 Major Models, Avoiding Data Breach Minefields

As a CTO or CIO, you may lose sleep over AI suppliers' promises. They verbally guarantee data isolation, but leak user privacy under pressure? This is not sci-fi but a real risk. The WDCD Guardrail Test cuts to the chase, simulating high-pressure scenarios to check if models break promises. Stop blindly trusting hype—see the real scores and avoid data disasters.

AI评估 WDCD测试 企业AI
276

Winzheng Homepage Upgrade! 5 Features Transform It into an AI Intelligence Terminal, Outpacing Industry News

Winzheng (winzheng.com) has upgraded its homepage from a simple product showcase into an AI intelligence terminal, featuring a Bloomberg-style real-time dashboard, AI-powered smart search, curated headline news feeds, a data trust wall, and embedded widgets for sharing YZ Index rankings. The redesign aims to deliver trusted, real-time, data-driven insights, helping users stay ahead in the fast-evolving AI landscape.

赢政天下升级 AI仪表盘 智能搜索
213

Unveiling the WDCD Commitment Test: 3 Rounds, 30 Questions Targeting AI’s “Breach of Trust” Pain Points, Disrupting the Evaluation Landscape!

The YZ Index WDCD Commitment Test, launched by Winzheng (winzheng.com), uses a 3-round, 30-question design to precisely dissect AI’s “credibility crisis.” It exposes the hidden danger of AI failing to honor its promises, urging enterprises to move beyond flashy benchmark scores and focus on true reliability.

AI评测 赢政指数 WDCD测试
252

Doubao Pro Stability Plunges 19.8 Points: Inconsistent Answers to Same Questions Become Biggest Weakness

In this week's Winzheng AI evaluation, Doubao Pro's overall score increased by 16.1 points, but its stability dimension dropped sharply by 19.8 points to 34.7, revealing severe challenges in maintaining answer consistency. This phenomenon may result from technical adjustments like temperature parameter changes or model routing updates, reflecting a trade-off between capability enhancement and output predictability.

豆包Pro 稳定性测试 AI评测
287

YZ Index Weekly Report: Collective Leap in Task Expression Capabilities, Claude Series Pioneers Material Constraint Track

This week's YZ Index evaluation captures a rare synchronous improvement in the "task expression" dimension across 10 out of 11 mainstream AI models, while Claude Opus 4.6 uniquely breaks through in the "material constraint" dimension. The report analyzes these developments and offers developer selection advice for different application scenarios.

赢政指数 AI评测
383