WDCD Commitment Ranking: GPT-5.5 Dominates with 71.67 Points, Grok 4 Trails at 52.5 Points
The WDCD Commitment Test reveals models' true performance under constraints through three rounds of dialogue. GPT-5.5 leads with 71.67 points, while Grok 4 scores only 52.5 points, ranking last—a gap of 19.17 points between the top and bottom.