OpenAI o1 Reasoning Model Preview Leaked: 83% on AIME, Sam Altman Confirms Imminent Release

OpenAI's highly anticipated o1 series reasoning model test preview was accidentally leaked, achieving an impressive 83% score on the AIME math competition benchmark. CEO Sam Altman confirmed the model's upcoming official release, marking a potential shift from AI's 'generation' to 'reasoning' era.

News Lead

Recently, the test preview of OpenAI's highly anticipated o1 series reasoning model was accidentally leaked, causing a sensation in the AI community. Developed under the codename 'Strawberry', the model focuses on long-chain reasoning capabilities and achieved an impressive 83% score on the AIME mathematical competition benchmark, far exceeding existing models. Related discussions on X platform garnered over 80,000 interactions, with OpenAI CEO Sam Altman quickly responding to confirm the model's imminent official release. This incident has not only ignited developer enthusiasm but is also viewed as a key signal of AI's transition from the 'generation' to 'reasoning' era.

Background: Evolution from Strawberry to o1

OpenAI's o1 series model originated from an internal project codenamed 'Strawberry', which Sam Altman mentioned in interviews as early as 2024. The project aimed to develop AI with stronger 'System 2' reasoning capabilities - the human-like step-by-step thinking process, rather than the rapid 'System 1' intuitive generation of traditional large language models.

The background can be traced to the limitations of the GPT-4 series. Despite GPT-4's excellent performance in multimodal and general tasks, it still frequently exhibited 'hallucination' problems in complex mathematics, physics, and long logical chain reasoning - generating seemingly plausible but incorrect results. OpenAI engineers revealed on X that o1 significantly improves reliability by simulating human 'Chain of Thought' through reinforcement learning and new training paradigms.

The leak originated from API test screenshots shared by a user on X, showing the o1-preview model achieving 83% accuracy on AIME (American Invitational Mathematics Examination) 2024 problems and 79% on GPQA (Graduate-level Physics Questions), far exceeding GPT-4o's 13% and GPT-4T's 50%. This data spread rapidly, with Altman responding within hours: "Yes, o1 is coming soon." The tweet received over 50,000 likes.

Core Content: Technical Breakthrough in Long-Chain Reasoning

The o1 model's biggest highlight is its 'long-chain reasoning' mechanism. While traditional LLMs rely on massive parameters to directly predict the next token, o1 introduces internal 'thinking steps' - the model generates hundreds to thousands of hidden reasoning tokens before output, forming complete logical paths. This design draws from cognitive science, similar to 'think before you answer'.

Benchmark test data shows o1 crushing competitors across multiple challenging tasks:

  • AIME 2024: 83% (GPT-4o only 13%)
  • GPQA Diamond: 79% (leading Claude 3.5 Sonnet)
  • Codeforces programming competition: Elo rating 1891 (upper-intermediate human level)
  • International Mathematical Olympiad (IMO) select problems: approaching gold medal level

Additionally, o1 supports tool calling and multi-step planning, excelling in physics simulation and chemical reaction prediction. The leaked API interface shows that while model response times are long (complex problems require several minutes), accuracy improves 3-5x, greatly alleviating the 'hallucination' pain point.

On the technical side, o1 employs a new reinforcement learning framework with reward functions emphasizing logical consistency and factual accuracy. OpenAI documentation mentions training data including millions of manually annotated reasoning trajectories, combined with self-supervised distillation to further compress computational costs.

Various Perspectives: Heated Discussion and Expert Analysis

‘o1 isn't a small upgrade, it's a paradigm shift. It proves the feasibility of reasoning-specific training, and future AI will think like scientists.’ - Andrej Karpathy, former OpenAI researcher and now independent AI entrepreneur, commenting on X.

On X platform, discussion remains heated. Developer @levelsio stated: "After testing o1, GPT-4 instantly became obsolete. Its logical chains when debugging code amazed me." Another AI researcher @yoheinakajima shared: "o1 approaches human experts on graduate-level problems, the Strawberry project succeeded."

Sam Altman emphasized in his response: "We spent considerable time ensuring safety and reliability. o1 will be gradually released." Competitor Anthropic CEO Dario Amodei posted congratulations but hinted that the Claude series is also catching up. In Chinese developer communities like Zhihu and Bilibili, post views exceeded millions, with many predicting o1 will reshape programming and research toolchains.

Critical voices are also present. Some experts worry about excessive computational costs - o1 requires 10x GPT-4 resources per inference, potentially exacerbating the AI arms race. Meta AI researcher Soumith Chintala noted: "While reasoning models are powerful, generalization to open-world scenarios still needs validation."

Impact Analysis: Dawn of the AI Reasoning Era

o1's emergence may reshape the AI ecosystem. First, for developers, it could replace GPT-4 as the default workhorse, especially in mathematical modeling, algorithm design, and scientific research. Companies like xAI and Google DeepMind have indicated they will follow suit with reasoning optimization.

On a broader scale, o1 marks the beginning of the 'reasoning era'. AI previously excelled at shallow pattern matching but is now transitioning to deep logical reasoning, potentially accelerating the AGI timeline. However, challenges remain: high energy consumption requires green computing support, and safety alignment (such as avoiding malicious reasoning chains) becomes crucial. Economically, API pricing is expected to exceed GPT-4o, benefiting high-end users while low-end applications may maintain status quo.

From a global perspective, Chinese AI companies like Baidu and Alibaba are accelerating similar model development, with o1 expected to stimulate domestic reasoning technology investment. In education, o1 can assist personalized teaching, making physics and chemistry problem-solving more reliable.

Conclusion: Anticipating the Official Debut

While the OpenAI o1 preview leak was unexpected, it illuminated AI's future ahead of schedule. With its exceptional reasoning capabilities, it not only solves pain points but opens a new era. As Sam Altman confirms the imminent release, the industry eagerly awaits. Regardless of whether the benchmark legend continues, o1 undoubtedly pushes AI toward greater intelligence and reliability. In the future, AI will not just generate text but truly 'think' about the world.