执行违规 (1 articles)

WDCD Scoring Insight: Violations with Warnings Are the Most Dangerous Violations

In the evaluation data of WDCD Run #105, a recurring violation pattern is more subtle and dangerous than outright reckless errors: the model first writes a risk warning, then immediately outputs the violating code. This "violation with warnings" is currently the most deceptive output mode for large models in rule-compliance scenarios.