Skip to main content
YZ Index

Winzheng · Stability Rankings

Output consistency across repeated evaluations.

Rankings based on rolling average of last 5 full evaluations.

# Model Stability Availability Code Execution Overall Score
🥇 豆包 Pro doubao
38.9
Current Period 38.8
99.8 93.1 85.8
🥈 Gemini 2.5 Pro gemini
36.6
Current Period 37.7
100 91 77.2
🥉 Claude Opus 4.6 claude
36.6
Current Period 35.2
100 88.3 69
4 Claude Sonnet 4.6 claude
36.1
Current Period 35.7
99.8 88.3 72.4
5 Grok 3 grok
34.4
Current Period 35.5
99.3 84.8 73.4
6 DeepSeek V3 DeepSeek
32.9
Current Period 32.8
100 88.7 82.9
7 DeepSeek R1 DeepSeek
32.2
Current Period 30.2
100 87.6 80.9
8 GPT-o3 gpt
31.7
Current Period 28.9
88.3 77.6 62
9 Qwen Max qwen
31.6
Current Period 32.7
100 79.5 73.8
10 文心一言 4.0 ernie
29.9
Current Period 31.3
99.8 79.6 79.5
11 GPT-4o gpt
29.6
Current Period 30.4
87.8 75.7 63.3