AI Reviews

Real testing, real data. We evaluate AI models, smart hardware, and cutting-edge tech with rigorous methodology — giving you the most objective reference.

🏠 Our Reviews LMSYS Chatbot Arena MLCommons Ars Technica

winzheng.com

DeepSeek V3稳定性暴跌21.4分的技术拆解

DeepSeek V3本周稳定性得分从53.4分骤降至32.0分，跌幅达21.4分。尽管编程和长上下文能力大幅提升，但在多个基础任务上出现严重性能退化，暴露出模型更新中的系统性问题。