WDCD Five-Scenario Cross-Evaluation: Business Rules Become the Hardest Hurdle, Claude and Doubao Show 2-Point Lopsided Gap
The WDCD compliance test uses three rounds of dialogue to expose model failure points under real constraints. Pilot data shows that the business rules scenario is a common weakness, with a maximum score of only 2.5, while the safety compliance scenario creates the widest gap among models.