DeepSeek V3 Stability Plunges 21.4 Points: In-Depth Analysis of Model Output Consistency Crisis
DeepSeek V3 exhibited a contradictory performance in this week's evaluation: significant improvements in multiple capability metrics, with the overall score rising from 52.9 to 66.6, but a cliff-like drop in the stability dimension. This phenomenon of "enhanced capabilities but unstable output" deserves in-depth analysis.