YZ Index
Model Incident Reports
Auto-detected: overall crash / dimension collapse / strict task zeroed · updated weekly
10
GPT-4o Code Execution (v5) dropped 23.7 points
10
Claude Opus 4.6 Stability dropped 22.5 points
10
Claude Sonnet 4.6 Stability dropped 23 points
10
DeepSeek R1 Stability dropped 22.1 points
10
DeepSeek V3 Stability dropped 21.4 points
10
文心一言 4.0 Stability dropped 22.1 points
10
Gemini 2.5 Pro Stability dropped 22.8 points
10
GPT-4o Grounding dropped 21.9 points
10
GPT-4o Stability dropped 20.6 points
10
GPT-4o Availability dropped 35 points
10
GPT-o3 Grounding dropped 33.5 points
10
GPT-o3 Stability dropped 25 points
10
GPT-o3 Availability dropped 31 points
10
Grok 3 Stability dropped 22.5 points
10
Qwen Max Stability dropped 22.8 points
9