Skip to main content
YZ Index

Winzheng · Code Execution Rankings

Code sandbox execution pass rate. Real code, real results.

# Model Code Execution Grounding Overall
🥇 豆包 Pro doubao
89.8
70.8 81.3
🥈 Claude Sonnet 4.6 claude
86.8
78.4 83
🥉 Grok 4 grok
86.8
73.9 81
4 DeepSeek V4 Pro DeepSeek
86.7
63.7 76.4
5 Qwen3 Max qwen
85.5
71 79
6 Gemini 2.5 Pro gemini
85.2
71.5 79
7 GPT-o3 gpt
84.8
70.3 78.3
8 GPT-5.5 gpt
84.6
67.8 77
9 Claude Opus 4.7 claude
83.9
75.2 80
10 Gemini 3.1 Pro gemini
83
71.1 77.7
11 文心一言 4.5 ernie
68.2
65.8 67.1