Research Lab
Leaderboards tell you who's stronger. Lab tells you why.
排行榜回答"谁更强",Lab 负责回答"为什么"。
独立研究 / 数据驱动 / 开放验证 / 零赞助
We don't take money from any AI company. No 'partnership evaluations', no 'sponsored reports', no 'pre-evaluation consultations'. Every point in the Winzheng Index is computed by our system, not negotiated.
动态语境衰变
约束在多轮对话中如何被遗忘?我们量化了从 R1 确认理解到 R3 完全妥协之间的衰减曲线,揭示模型"答应了但记不住"的真实规律。
否定窗口技术
区分"引用违规"和"执行违规"的判分创新。当模型说"我不会提供 X"时,X 出现在否定语境中不算违规,只有真正执行才扣分。
零 AI 裁判
为什么用规则代替 AI 判分更可信?WDCD 全部使用关键词匹配 + 正则规则判分,100% 可审计、可复现,消除"让 AI 评判 AI"的循环依赖。
YZ Index
ActiveFlagship product.11 models,212 questions, code sandbox + reference check + rolling average.
A complete report every week, telling you who improved, who regressed, and who's worth it most.
Security & Adversarial Research
Work in ProgressCan AI models be deceived? Can they be stolen? Can they be bypassed?
We dissect models, test defenses, find vulnerabilities—before the bad guys do.
Edge Computing Architecture
Work in ProgressNot everyone has H100s.
We research how to run full-featured LLMs on a $400 mini PC.