Skip to main content
YZ Index

Winzheng · Stability Rankings

Output consistency across repeated evaluations.

Rankings based on rolling average of last 5 full evaluations.

# Model Stability Availability Code Execution Overall Score
🥇 Doubao Pro doubao
46
Current Period 71.2
100 92.2 86.9
🥈 Claude Opus 4.7 claude
44.7
Current Period 67.7
99.8 89.2 72.8
🥉 Claude Sonnet 4.6 claude
43.2
Current Period 62.7
100 88.5 75.9
4 Gemini 3.1 Pro gemini
42.2
Current Period 63.2
95 82.2 71
5 Gemini 2.5 Pro gemini
41.3
Current Period 66
94.6 80.4 73.5
6 Grok 4 grok
40.6
Current Period 68.6
92.4 84.9 70.4
7 DeepSeek V4 Pro DeepSeek
39.7
Current Period 59.1
100 90.2 76.2
8 GPT-o3 gpt
39.4
Current Period 58
100 84.9 71.1
9 Qwen3 Max qwen
39.2
Current Period 59.8
100 87.9 77.4
10 GPT-5.5 gpt
38.5
Current Period 51.8
100 84.6 71.9
11 ERNIE Bot 4.5 ernie
33.5
Current Period 44.2
99.4 76.1 75.5