Research Lab

Leaderboards tell you who's stronger. Lab tells you why.

lab.hero_tagline_cn

Independent Research / Data-Driven / Open Validation / Zero Sponsorship

We don't take money from any AI company. No 'partnership evaluations', no 'sponsored reports', no 'pre-evaluation consultations'. Every point in the Winzheng Index is computed by our system, not negotiated.

WDCD · The World's First AI Instruction Compliance Benchmark

3-Round Dialogue Stress Test / 32 Enterprise Scenarios / 100% Rule-Based Scoring / Zero AI Judge

FLAGSHIP
"We don't test whether AI can do it — we test whether it keeps its promises."
11 models
5 constraint categories
3 rounds of dialogue pressure
30 test questions

First round of data is now public

Research Highlights

Dynamic Contextual Decay

How do constraints fade across multi-turn dialogue? We quantified the decay curve from R1 acknowledgment to R3 full compromise, revealing how models "agree but forget."

Negation Window Technique

A scoring innovation that distinguishes "citing a violation" from "executing a violation." When a model says "I won't provide X," X appearing in negation context doesn't count — only actual execution is penalized.

Zero AI Judge

Why are rules more trustworthy than AI judges? WDCD uses keyword matching + regex rules for 100% of scoring — fully auditable, reproducible, eliminating the circular dependency of "AI judging AI."

Data Transparency
Open Evaluation Data API All raw scores and responses available via REST API
Fully Open Scoring Rules Violation keywords and scoring logic for every question are auditable
Embeddable Widget Available Embed the WDCD leaderboard on any webpage with one line of code
Fully Auditable Code Evaluation framework, scoring engine, and data pipeline methodology are fully open
What We're Dissecting

YZ Index

Active

Flagship product.11 models,154 questions, code sandbox + reference check + rolling average.
A complete report every week, telling you who improved, who regressed, and who's worth it most.

Latest Output:Full weekly evaluation has been updated · 06-29
Enter Winzheng Index

Security & Adversarial Research

Work in Progress

Can AI models be deceived? Can they be stolen? Can they be bypassed?
We dissect models, test defenses, find vulnerabilities—before the bad guys do.

First report in preparation
View Related Reports

Edge Computing Architecture

Work in Progress

Not everyone has H100s.
We research how to run full-featured LLMs on a $400 mini PC.

First report in preparation
View Related Reports
Latest Teardowns View All →