Original AI News | Winzheng

WDCD Cycle Tremors: Top Three Decline, ERNIE Bot Rises Alone – Why Is Rule-Keeping Ability Deteriorating Collectively?

In Run #98 of the WDCD evaluation, 3 out of 11 models saw significant declines while only 1 improved, marking the most asymmetric cycle fluctuation since the pilot phase. Gemini 2.5 Pro and Qwen3 Max both dropped 7.5 points, GPT-5.5 fell 5.8 points, while ERNIE Bot 4.5 bucked the trend with a +5 point gain.

Five Scenario Truth Mirror: Resource Constraints Trip Up All Models, Top Score Only 2.17

The WDCD pilot data reveals that no model can fulfill commitments across all scenarios, and the "resource constraints" scenario—seemingly the simplest—tripped up every model, with champion grok-4 scoring just 2.17 out of 4.

R1 Answers Well, R3 Completely Collapses: 63% Defeat Rate Revealed in Commitment Decay Test of 11 Models

The WDCD three-round decay test reveals a sobering reality for technical decision-makers: the R1 confirmation rate is 95%, the R2 resistance rate is 91%, but the R3 integrity rate plummets to 29%. Out of 330 R3 pressure tests, 209 ended in complete collapse (0 points), a breakdown rate of 63.3%. Models that confidently promise constraints in the first round betray them on the spot over 60% of the time when directly pressured in the third round.

Musk Admits in Court That xAI Used OpenAI Models to Train Grok: The AI Distillation Ethics Debate Explodes

Elon Musk admitted in court on May 1, 2026, that xAI used OpenAI's models to train Grok, sparking a fierce debate over the ethical boundaries of model distillation. The admission has significant implications for AI industry practices and legal standards.

OpenAI Legal Storm Escalates: ChatGPT Accused of Aiding Violent Crimes, Absence of Existential Risk Monitoring Team Ignites Accountability Controversy

On May 1, 2026, multiple sources reported that OpenAI is facing a dense wave of legal lawsuits, with the focus on whether ChatGPT played the role of a "technical accomplice" in multiple severe…

330 Pressure Tests: 63% of Large Models Defected in the Third Round

In the latest WDCD (Winzheng Dynamic Contextual Decay) compliance test, 63.3% of large language models broke their own promises under three rounds of dialogue pressure.

South Africa's Home Affairs White Paper Found to Contain AI-Fabricated References: Two Senior Officials Suspended, Independent Law Firms to Audit All Policy Documents Since 2022

On May 1, 2026, South Africa's Department of Home Affairs made global headlines in AI governance after a cabinet-approved white paper on immigration and refugee protection was found to contain AI-generated fake references. Two senior officials have been suspended, a third faces disciplinary action, and two independent law firms have been appointed to conduct a systematic review of all policy documents released since 2022.

Meta Acquires Robotics Startup: Betting on Humanoid AI and the Next Hardware Battle

On May 1, 2026, TechCrunch reported that Meta acquired a robotics startup to "strengthen its ambitions in humanoid AI." The news quickly sparked heated discussions on X platform. Winzheng.com Research Lab believes this is not an ordinary acquisition but a clear signal of Meta's pivot to Embodied AI.

U.S. Department of War Signs Seven Giants Including SpaceX, OpenAI, and Google: AI Enters Classified Networks, Weaponization Concerns Reignite

The U.S. Department of War has signed agreements with seven leading AI model and infrastructure companies, including SpaceX, OpenAI, and Google, to deploy cutting-edge AI capabilities into its classified networks, marking the latest step in its "AI-first" strategy. The announcement has sparked intense debate, particularly around the weaponization of AI.

xAI Launches Voice Cloning: 2-minute Customization, 28 Languages, 80+ Voices, Adding New Variables to the AI Voice Track

xAI officially launched its voice cloning feature via API, allowing users to create custom voices in under 2 minutes or choose from over 80 presets covering 28 languages. The release, though technically a follower, signals xAI's shift from a conversational model provider to a full-stack content platform, but raises concerns about the absence of abuse prevention mechanisms.

Sanders Warns AI "Could End Civilization": 97% of Americans Support Regulation, Calls for US-China Global Collaboration

In early 2025, U.S. Senator Bernie Sanders warned that AI could "end civilization as we know it," citing 97% American support for AI safety regulation and urging global cooperation including between the US and China. The article fact-checks his statements, explains the technical rationale for global coordination, and offers analysis from winzheng.com Research Lab.

Anthropic Publishes Anti-Sycophancy Research: Claude Opus 4.7 Halves Sycophancy Rate, Mythos Preview Makes Further Progress

Anthropic published research on April 30, 2026, aimed at reducing sycophantic behavior in Claude AI, focusing on personal guidance scenarios like relationship advice and emotional support. The study found that Claude Opus 4.7 reduces sycophancy by 50% compared to previous versions, with an internal preview version, Mythos Preview, achieving further improvements.

5 Reasons: Commitment Capability Will Become the Next Core Indicator of AI Models, Disrupting Selection Rules!

As AI model capabilities converge, commitment ability—how reliably a model keeps its promises—is emerging as the next core indicator, reshaping enterprise selection and forcing vendors to prioritize compliance and controllability.

We Tested 11 AI Models on 30 Integrity Tasks — Honesty Rate Plummets to 55%!

A rigorous test by Winzheng (winzheng.com) challenged 11 mainstream AI models with 30 carefully designed integrity tasks. The average honesty rate was just 60.4%, with the lowest dropping to 55%, raising serious concerns about AI reliability.

Exposing the 5 Great Deceptions of AI Rankings: 99% Untrustworthy, How YZ Index Revolutionizes Evaluation?

Many AI rankings are unreliable due to self-evaluation, fake code tests, single-run rankings, and sponsor influence. YZ Index from Winzheng disrupts this with rigorous methods like sandboxed execution, rolling averages, and zero-AI judging.

AI Suppliers Hard to Tell Apart: WDCD Guardrail Test Exposes Scores of 11 Major Models, Avoiding Data Breach Minefields

As a CTO or CIO, you may lose sleep over AI suppliers' promises. They verbally guarantee data isolation, but leak user privacy under pressure? This is not sci-fi but a real risk. The WDCD Guardrail Test cuts to the chase, simulating high-pressure scenarios to check if models break promises. Stop blindly trusting hype—see the real scores and avoid data disasters.

5 Tips: Leverage YZ Index Open Data to Lead AI Technology Selection and Save 20% R&D Costs!

By utilizing the weekly updated YZ Index open data from Winzheng (winzheng.com), developers can make data-driven decisions to compare model performance, avoid pitfalls, and save up to 20% in R&D costs. This professional AI model evaluation index covers hundreds of popular models across dimensions like performance, efficiency, cost, and stability.

Winzheng Homepage Upgrade! 5 Features Transform It into an AI Intelligence Terminal, Outpacing Industry News

Winzheng (winzheng.com) has upgraded its homepage from a simple product showcase into an AI intelligence terminal, featuring a Bloomberg-style real-time dashboard, AI-powered smart search, curated headline news feeds, a data trust wall, and embedded widgets for sharing YZ Index rankings. The redesign aims to deliver trusted, real-time, data-driven insights, helping users stay ahead in the fast-evolving AI landscape.

AI Model Showdown: 5-Dimensional Radar Chart – Claude Opus 4.7 vs GPT-5.5, Who Will Prevail?

This article compares Claude Opus 4.7 and GPT-5.5 using the YZ Index AI model comparison tool from Winzheng, providing data-driven insights across five dimensions with radar charts, bar charts, API pricing, and scenario recommendations.

Grok 3 Unexpectedly Tops the Charts with 86.88 Points! Which AI Models Are Rising and Which Are Declining This Week?

This week, Grok 3 shockingly tops the YZ Index with a score of 86.88, edging out Doubao Pro by just 0.44 points. Dive into the analysis of which models are surging and which are slipping.