工程边界 (1 articles)

11 Models Attempt SQL Retention Task: 9 Score Zero, DeepSeek and Grok Only 66.7

In the YZ Index v6 code execution test, the "SQL Monthly Retention Cohort" problem laid bare the true capabilities of 11 models. The result was brutal: 9 models scored 0, with only DeepSeek V4 Pro and Grok 4 managing a score of 66.7.