WDCD Run #135: Qwen3 Max Leads with Only 10% Instruction Decay as Field Average Hits 43.3%
WDCD Run #135 (2026-05-27) evaluated 11 large language models across three dialogue rounds, finding an average commitment decay of 43.3%. Qwen3 Max topped the leaderboard with 72.5 points and just 10% decay, while Grok 4 recorded the steepest drop at 70%.