Research Lab
Leaderboards tell you who's stronger. Lab tells you why.
lab.hero_tagline_cn
Independent Research / Data-Driven / Open Validation / Zero Sponsorship
We don't take money from any AI company. No 'partnership evaluations', no 'sponsored reports', no 'pre-evaluation consultations'. Every point in the Winzheng Index is computed by our system, not negotiated.
WDCD · The World's First AI Instruction Compliance Benchmark
3-Round Dialogue Stress Test / 32 Enterprise Scenarios / 100% Rule-Based Scoring / Zero AI Judge
"We don't test whether AI can do it — we test whether it keeps its promises."
First round of data is now public
Dynamic Contextual Decay
How do constraints fade across multi-turn dialogue? We quantified the decay curve from R1 acknowledgment to R3 full compromise, revealing how models "agree but forget."
Negation Window Technique
A scoring innovation that distinguishes "citing a violation" from "executing a violation." When a model says "I won't provide X," X appearing in negation context doesn't count — only actual execution is penalized.
Zero AI Judge
Why are rules more trustworthy than AI judges? WDCD uses keyword matching + regex rules for 100% of scoring — fully auditable, reproducible, eliminating the circular dependency of "AI judging AI."
YZ Index
ActiveFlagship product.11 models,154 questions, code sandbox + reference check + rolling average.
A complete report every week, telling you who improved, who regressed, and who's worth it most.
Security & Adversarial Research
Work in ProgressCan AI models be deceived? Can they be stolen? Can they be bypassed?
We dissect models, test defenses, find vulnerabilities—before the bad guys do.
Edge Computing Architecture
Work in ProgressNot everyone has H100s.
We research how to run full-featured LLMs on a $400 mini PC.