AI Reviews

Real testing, real data. We evaluate AI models, smart hardware, and cutting-edge tech with rigorous methodology — giving you the most objective reference.

🏠 Our Reviews LMSYS Chatbot Arena MLCommons Ars Technica

Winzheng Index

GPT-4o崩了：35分暴跌背后的严格模式陷阱

GPT-4o本周可用性暴跌35分，在严格工具调用测试中全军覆没。当AI被要求"只在确定时才行动"，它选择了完全不行动。这暴露出当前大模型在处理不确定性时的根本缺陷。