模型日报 (1 articles)

Two Zero-Execution Shocks, Claude Holds at 88.75

Today’s Smoke benchmark shows Claude Opus 4.7 leading with 88.75, while two models scored zero in code execution; the real differentiator is material constraint, not execution ability.