AI Big Models in Turmoil! Wenxin Yiyan Soars 24.7 Points but Integrity Collapses, Gemini Drops 16 Points in Three Consecutive Declines
The Smoke lightweight evaluation has sent shockwaves through the AI community: Wenxin Yiyan 4.5 saw its main leaderboard score soar by 24.7 points, yet its integrity rating fell from pass to fail; meanwhile, the Gemini series suffered three consecutive declines, and DeepSeek V4 Pro plummeted by 16.1 points on the main leaderboard.