11 AIs Answer the Same Question, 6 Get Even the Day of the Week Wrong

Mar 21, 2026 836 Views - Read Source Winzheng Index

DeepSeek GPT-4o 时区计算 Model Evaluation AI推理能力

When 11 top AI models were asked to solve a time zone calculation that elementary school students could handle, the results were jaw-dropping: over half of the models couldn't even perform the most basic time calculations. Even more ironically, these "intelligent assistants" valued at billions of dollars, not a single one realized that March 15th is a critical date for US Daylight Saving Time.

A Simple Question That Reveals AI's True Capabilities

The question was ridiculously simple: Given Beijing time as Saturday, March 15th at 3:00 PM, calculate the local time and day of the week for New York, London, Tokyo, and Sydney. Any middle school student who has learned about time zones could provide the answer within 2 minutes.

However, the performance of 11 mainstream AI models makes one question whether they possess any real "intelligence":

6 models completely failed (score: 0): Doubao Pro, DeepSeek R1, Grok 3, Gemini 2.5 Pro, Qwen Max
5 models answered correctly (score: 100): DeepSeek V3, Wenxin Yiyan 4.0, Claude Sonnet, GPT-4o, Claude Opus

The most outrageous was Alibaba's Qwen Max, which calculated New York time as "Friday 22:00" — not only was the time wrong, but it even got the day of the week backwards. This elementary error doesn't even hold up to basic mathematical logic.

Same Company, Worlds Apart

Even more bizarre is how different models from the same company performed completely differently. DeepSeek V3 answered perfectly, but its "reasoning-enhanced version" DeepSeek R1 got everything wrong. Logically, R1 should be an upgraded version of V3 with stronger reasoning capabilities, yet it failed on such a simple question.

This exposes a harsh truth: So-called "reasoning models" might just be overfitted to specific benchmarks, with questionable real reasoning abilities. When faced with a slightly modified real-world problem, these fancy "enhanced versions" prove less reliable than their basic counterparts.

The Common Blind Spot: Daylight Saving Time

What's even more disturbing is that not a single model mentioned that March 15th happens to fall during the US Daylight Saving Time transition period (second Sunday of March each year). In 2025, Daylight Saving Time begins on March 9th, meaning that on March 15th in the question, the US would already be observing Daylight Saving Time, making New York UTC-4 rather than UTC-5.

This means all models gave the wrong New York time — the correct answer should be 3:00 AM, not 2:00 AM. Even the "honor students" who scored 100 points were merely calculating mechanically based on the incorrect time zone given in the question, completely lacking real temporal common sense.

"If an AI doesn't even know basic facts like 'New York uses Daylight Saving Time in March,' why should we trust it to handle more complex real-world problems?" — said an AI researcher who wished to remain anonymous.

The Price of Tech Worship

The problems revealed by this test are far more serious than they appear on the surface. As we hand over more and more decision-making power to AI, their collective failure on such basic questions is chilling:

Google's Gemini 2.5 Pro claims world-leading multimodal capabilities, yet can't even solve a text-based problem
Musk's heavily promoted Grok 3 boasts superior "real-time internet connectivity," yet can't calculate simple time differences
The domestically "independently controllable" Doubao Pro completely fails at such simple reasoning

If these models can't even figure out time zones, do we really want them making medical diagnoses, financial decisions, or driving autonomously?

Final Thoughts

This "time zone exam" has sounded the alarm for the entire AI industry. In the arms race for parameter scale and benchmark scores, we may have overlooked the most fundamental things — common sense and logic.

As one Silicon Valley investor put it: "When your AI assistant can't even figure out that 'New York is 13 hours behind Beijing,' so-called AGI (Artificial General Intelligence) might still be light-years away."

In today's world where AI capabilities are infinitely exaggerated, a simple elementary school math problem is enough to burst the bubble. Next time someone tries to sell you on "super intelligence," why not first ask: What time is it in New York?

Data source: YZ Index | Run #33 | View raw data