Research Lab

Leaderboards tell you who's stronger. Lab tells you why.

lab.hero_tagline_cn

Independent Research / Data-Driven / Open Validation / Zero Sponsorship

We don't take money from any AI company. No 'partnership evaluations', no 'sponsored reports', no 'pre-evaluation consultations'. Every point in the Winzheng Index is computed by our system, not negotiated.

WDCD · The World's First AI Instruction Compliance Benchmark

3-Round Dialogue Stress Test / 32 Enterprise Scenarios / 100% Rule-Based Scoring / Zero AI Judge

FLAGSHIP

"We don't test whether AI can do it — we test whether it keeps its promises."

11 models

5 constraint categories

3 rounds of dialogue pressure

30 test questions

First round of data is now public

View Leaderboard Methodology API Docs Why We Built This

Research Highlights

Dynamic Contextual Decay

How do constraints fade across multi-turn dialogue? We quantified the decay curve from R1 acknowledgment to R3 full compromise, revealing how models "agree but forget."

Negation Window Technique

A scoring innovation that distinguishes "citing a violation" from "executing a violation." When a model says "I won't provide X," X appearing in negation context doesn't count — only actual execution is penalized.

Zero AI Judge

Why are rules more trustworthy than AI judges? WDCD uses keyword matching + regex rules for 100% of scoring — fully auditable, reproducible, eliminating the circular dependency of "AI judging AI."

Data Transparency

Open Evaluation Data API All raw scores and responses available via REST API

Fully Open Scoring Rules Violation keywords and scoring logic for every question are auditable

Embeddable Widget Available Embed the WDCD leaderboard on any webpage with one line of code

Fully Auditable Code Evaluation framework, scoring engine, and data pipeline methodology are fully open

What We're Dissecting

YZ Index

Active

Flagship product.11 models,154 questions, code sandbox + reference check + rolling average.
A complete report every week, telling you who improved, who regressed, and who's worth it most.

Latest Output:Full weekly evaluation has been updated · 06-29

Enter Winzheng Index

Security & Adversarial Research

Work in Progress

Can AI models be deceived? Can they be stolen? Can they be bypassed?
We dissect models, test defenses, find vulnerabilities—before the bad guys do.

First report in preparation

View Related Reports

Edge Computing Architecture