WDCD Review Reveals: Resource Constraints Become the Achilles' Heel of 11 Models, Average Score Only 1.7

Jun 3, 2026 668 Views - Read Source Winzheng Index

WDCD Compliance Test 模型偏科企业选型场景可靠性

The most brutal finding of the WDCD compliance test is that resource constraints crippled all models, with an average score of only 1.7 across 11 models, far below the other four scenarios.

Why Resource Constraints Became a Collective Waterloo

In the three-round dialogue design, when R3 directly pressured models to break resource quotas, most models immediately surrendered. gemini-2.5-pro, with a score of 2.5, was the only model exceeding 2 points, while the remaining ten models were all stuck in the 1–2 point range. doubao-pro and ernie-4.5 both dropped to 1 point, indicating that they had almost no resistance when facing persistent questioning such as "give a little more quota."

The Two Most Discriminating Scenarios

Resource constraints and data boundaries are the two dimensions where differences are most pronounced. In data boundaries, claude-opus-4.7 and claude-sonnet-4.6 scored 3 points, while the gemini series and ernie-4.5 scored only 1.5 points, a gap of 1.5 points. Resource constraints, however, pulled doubao-pro from the top of the business rules category directly to the bottom, with a single-scenario drop of 3 points.

Severe Uneven Performance Is Widespread

doubao-pro scored a perfect 4 in business rules but only 1 in resource constraints, a typical case of "good at reasoning but unable to hold the line."
claude-opus-4.7 scored 3.5 in security compliance and 3 in engineering standards, but only 1.5 in resource constraints, showing a clear shortfall in hard quota control.
deepseek-v4-pro scored 3.5 in security compliance but only 1.5 in data boundaries, indicating it is easily induced in sensitive data boundary scenarios.
gpt-5.5 and gpt-o3 both scored 4 in business rules, yet only 1.5 in resource constraints, also exhibiting the trait of "strong in business, weak in constraints."

Specific Recommendations for Enterprise Model Selection

If the core enterprise scenarios are financial risk control or medical compliance, prioritize claude-opus-4.7 or ernie-4.5, as these two models have the highest and most stable scores in security compliance scenarios.

If the business mainly involves internal approval workflows, contract terms, and pricing rules, doubao-pro and gpt-5.5 are more reliable, as they achieved perfect scores in the business rules scenario.

For teams that need strict control over API quotas, concurrency, and storage limits, currently no model can be trusted. Although gemini-2.5-pro is relatively the best, it still only scored 2.5 points. It is recommended to add an external rate-limiting layer.

The engineering standards scenario overall has high scores. Except for qwen3-max and ernie-4.5, all other models can achieve 3 points, making them suitable as alternatives.

No model passes all scenarios; model selection is essentially about accepting uneven performance.

The WDCD pilot phase has clearly revealed that resource constraints are the Achilles' heel common to all major models currently. If the weight of resource constraints is increased to 40% in the next phase, the rankings will undergo a drastic reshuffle.

Data source: YZ Index WDCD Compliance Ranking | Run #146 · Scenario Matrix | Evaluation Methodology

WDCD Review Reveals: Resource Constraints Become the Achilles' Heel of 11 Models, Average Score Only 1.7

Why Resource Constraints Became a Collective Waterloo

The Two Most Discriminating Scenarios

Severe Uneven Performance Is Widespread

Specific Recommendations for Enterprise Model Selection

Related Reviews

Winzheng Index WDCD Five-Scenario Review: Business Rules Become the Hardest, Grok-4 Scores Perfect 4, Claude-sonnet Only 1.8

Winzheng Index Resource Limitation Scenario Lowest at 1.55 Points: Maximum Spread of 2.45 Points Across 11 Models in WDCD Compliance Test

Winzheng Index Business Rules Become the Biggest Weakness in WDCD: qwen3-max Scores 1.55 vs deepseek 4

Winzheng Index WDCD v3.1 Five-Scenario Cross-Evaluation: Business Rules Score 1.3 at the Bottom, 11 Models Show Subject Imbalance of 2.1 Points