YZ Index
Model Incident Reports
Auto-detected: overall crash / dimension collapse / strict task zeroed · updated weekly
8
Claude Sonnet 4.6 严格题"SQL:疑似重复支付识别" from 满 points 跌至 0
8
Claude Opus 4.6 严格题"SQL:疑似重复支付识别" from 满 points 跌至 0
10
GPT-4o Code Execution (v5) dropped 23.7 points
10
GPT-4o Overall Score dropped 10.5 points
10
GPT-o3 Grounding dropped 33.5 points
10
Qwen Max Stability dropped 22.8 points
10
Grok 3 Stability dropped 22.5 points
10
GPT-o3 Availability dropped 31 points
10
GPT-o3 Stability dropped 25 points
10
Claude Opus 4.6 Stability dropped 22.5 points
10
GPT-4o Availability dropped 35 points
10
GPT-4o Stability dropped 20.6 points
10
GPT-4o Grounding dropped 21.9 points
10
Gemini 2.5 Pro Stability dropped 22.8 points
10
文心一言 4.0 Stability dropped 22.1 points
10
DeepSeek V3 Stability dropped 21.4 points
10
DeepSeek R1 Stability dropped 22.1 points
10
Claude Sonnet 4.6 Stability dropped 23 points
9