DeepSeek R1 Stability Plummets 22 Points: The Truth Behind Complete Failure on Simple Judgment Questions
DeepSeek R1's stability score crashed from 53.7 to 31.6 points this week, with the model failing basic judgment questions like whether water can boil at 101°C under standard pressure, raising serious concerns about its reliability.