Skip to main content
YZ Index

Winzheng · Code Execution Rankings

Code sandbox execution pass rate. Real code, real results.

# Model Code Execution Grounding Overall
🥇 Claude Opus 4.7 claude
84.3
95.5 89.3
🥈 DeepSeek V4 Pro DeepSeek
83.7
95 88.8
🥉 Grok 4 grok
76.3
95.7 85
4 Claude Sonnet 4.6 claude
75.2
92.5 83
5 GPT-o3 gpt
74
94.9 83.4
6 Doubao Pro doubao
73.2
91.9 81.6
7 Gemini 3.1 Pro gemini
72.9
94.2 82.5
8 Qwen3 Max qwen
71.2
92.9 81
9 Gemini 2.5 Pro gemini
63.8
95.1 77.9
10 ERNIE Bot 4.5 ernie
56.4
93.7 73.2
11 GPT-5.5 gpt
55.8
93.8 72.9