OpenAI o1-preview Reasoning Model Makes Heavyweight Debut: Crushes GPT-4o in Benchmarks, AI Enters New Era of 'Chain of Thought'

OpenAI officially released the o1-preview reasoning model on September 12, 2024, Beijing time, which comprehensively outperforms GPT-4o in benchmarks for mathematics, code generation, and scientific reasoning. The model emphasizes 'Chain of Thought' optimization, achieving more reliable complex problem-solving by simulating human step-by-step reasoning processes.

News Lead

On September 12, 2024, Beijing time, OpenAI officially released the o1-preview reasoning model, causing a sensation in the AI industry as this new product comprehensively crushes GPT-4o in benchmark tests for mathematics, code generation, and scientific reasoning. The model emphasizes 'Chain of Thought' optimization, achieving more reliable complex problem-solving by simulating human step-by-step reasoning processes. Within just hours of release, reposts on X platform exceeded 50,000, with the developer community hotly discussing its revolutionary potential. ChatGPT Plus users can already experience it early, driving a surge in subscriptions.

Background: The Evolution of AI Reasoning Capabilities

As a leader in the AI field, OpenAI has continuously iterated its large language models (LLMs) since ChatGPT went viral. The GPT series is renowned for its generative capabilities, fluently outputting text, images, and code, but often exhibits 'hallucinations' or shallow reasoning defects when facing complex logical problems. As early as 2023, OpenAI introduced 'Chain of Thought' prompting techniques to help models decompose problems, but this was merely external guidance, not an intrinsic mechanism.

The launch of o1-preview is a systematic response to this pain point. OpenAI CEO Sam Altman stated on X: 'o1 is the starting point of our reasoning model series. It learns to think like humans rather than directly generating answers.' This background stems from industry consensus: generative AI has reached saturation, and future competition focuses on reasoning capabilities to tackle high-difficulty tasks such as mathematics competitions, programming debugging, and scientific research.

Core Content: Technical Highlights and Performance Data of o1-preview

The core innovation of o1-preview lies in its built-in 'Chain of Thought' training mechanism. Unlike GPT-4o's 'one-step' generation, o1 performs multi-step reasoning simulation internally. Users don't see the complete process, but the model output is more accurate. Official benchmark tests show:

  • International Mathematical Olympiad (IMO) problems: o1-preview achieves an 83% solution rate, far exceeding GPT-4o's 13%.
  • Codeforces competitive programming platform: o1 scores 89 points, while GPT-4o only 34 points.
  • Graduate-level expert reasoning (GPQA): o1 accuracy rate 74.4%, GPT-4o at 53.6%.

Additionally, o1-preview has an o1-mini variant optimized for programming and mathematics, faster and more cost-effective. OpenAI emphasizes that the model learns to automatically generate thinking steps through reinforcement learning (RL) and massive reasoning data training, avoiding ineffective paths. Currently limited to ChatGPT Plus and Team subscribers with a weekly usage limit of 20 times, aimed at controlling load and collecting feedback.

In actual demonstrations, o1-preview can handle difficult problems such as 'proving a simplified version of Fermat's Last Theorem' or 'optimizing quantum computing algorithms,' with transparent output processes where users can view 'thinking traces.' This design not only improves reliability but also provides an explainable AI paradigm for developers.

Various Perspectives: Developer Discussions and Industry Disagreements

After the release, X platform instantly exploded with over 50,000 reposts and more than 100,000 likes. Developer @karpathy (former OpenAI researcher Andrej Karpathy) posted:

'o1-preview is a true leap! It's not faster, but smarter. The math and code benchmarks shocked me - this will be the starting point of a new programming era.'

However, it's not all praise. Anthropic CEO Dario Amodei responded on X:

'Reasoning models are the direction, but we still need to be vigilant about safety and alignment. o1 has huge potential, and our Claude will follow.'
Google DeepMind researchers also noted that while o1 is strong on benchmarks, its long context processing and multimodal capabilities are temporarily weaker than GPT-4o.

The Chinese developer community is equally active. Bilibili content creator 'AI Sentinel' analyzed: 'o1-preview is highly significant for domestic programming competitions and scientific research simulations, but the quota mechanism limits its popularization.' X user @ylecun (Yann LeCun, Meta AI Chief Scientist) takes a cautious stance:

'Benchmarks are important, but real-world applications are key. o1 is progress, but still far from AGI.'

Impact Analysis: Subscription Boom, Competitive Pressure, and Industry Transformation

The o1-preview release immediately triggered a ChatGPT Plus subscription boom. OpenAI data shows a 30% surge in new users on the first day, with the $20/month Plus plan becoming the focus. Compared to the free version, this model's exclusive access reinforces the paywall, driving revenue growth.

For competitors, the pressure is immense. Anthropic's Claude 3.5 Sonnet and Google's Gemini 1.5 Pro lag in reasoning and are expected to accelerate iterations. Meta's Llama series open-source community may seize the opportunity to catch up, but closed-source o1's lead is obvious. This marks AI's transformation from the 'generation era' to the 'reasoning era,' with future applications extending to autonomous driving algorithm verification, drug discovery, and legal reasoning.

Potential challenges include high computational costs—o1 takes longer for single reasoning tasks, API prices not yet announced; and ethical risks, such as reinforcement learning potentially amplifying biases. On the regulatory front, the US and EU are closely watching the safety of such high-capability models.

In the long term, o1-preview may reshape the AI ecosystem. Developers can use it to build smarter agents, such as autonomous code debugging or multi-step planning robots. Industry analysts predict that by 2025, the reasoning model market will be more than twice the size of generative AI.

Conclusion: The Dawn of AI Thinking

The advent of OpenAI o1-preview is not only a technical milestone but also a signal of AI paradigm shift. From generating text to reasoning about the world, humanity is witnessing the budding of machine 'thinking.' Despite existing challenges, its potential has ignited global imagination. In the future, with the official version of o1 and more competitors emerging, AI will move closer to artificial general intelligence. Developers and users are watching eagerly to see if this 'Chain of Thought' truly leads to the door of AGI.