Meta Releases Llama 3.1 405B: Strongest Open-Source Model Achieves 88.6% on MMLU, Developer Community Celebrates

Feb 12, 2026 659 approx.5min Grok/X

Llama 3.1 Meta 开源AI 大语言模型 AI性能基准

News Lead

On July 24th Beijing time, Meta officially released the latest Llama 3.1 series models, with the 405B parameter version claiming the performance crown of open-source large language models with an MMLU benchmark score of 88.6%. The model not only excels in multilingual support and long-context processing but also provides free commercial licensing in a fully open-source form, quickly igniting developer community enthusiasm. Related topic interactions on X platform have exceeded 150,000, with downloads showing explosive growth.

Background

Since its launch in 2023, the Llama series has become a benchmark in the open-source AI field. Meta promotes the democratization of large language models through Llama, aiming to break the monopoly of a few tech giants on high-end AI. Early versions like Llama 2 and Llama 3 gradually approached the performance of closed-source models like GPT-4, but still had gaps in parameter scale and multilingual capabilities. The release of Llama 3.1 405B represents another major offensive in Meta's open-source strategy.

Meta CEO Mark Zuckerberg stated in the release blog: "Llama 3.1 is our most advanced model to date, and we hope it can provide cutting-edge AI capabilities to developers worldwide." This series of models was trained on over 15 trillion tokens, covering multiple languages and domain knowledge, demonstrating Meta's substantial accumulation in massive data and computational resources.

Core Content

The Llama 3.1 405B parameter model is the flagship of the series, featuring a long context window of up to 128K tokens and native multilingual capabilities supporting 8 major languages including English, Chinese, and French. In benchmark tests, its MMLU (Massive Multitask Language Understanding) score of 88.6% surpasses the previous open-source record holder and approaches the level of closed-source models like Claude 3.5 Sonnet.

Additionally, the model excels in tool usage, code generation, and reasoning tasks. For example, it scores 84.6% on the GPQA (Graduate-level Question Answering) benchmark and achieves 89.0% accuracy on HumanEval code generation. Meta emphasizes that the model has passed rigorous safety evaluations and supports enterprise-level deployment.

The deployment threshold is extremely low: model weights and code are hosted on the Hugging Face platform, and users can run quantized versions on consumer-grade GPUs using frameworks like vLLM or TensorRT-LLM. Developers on X shared: "You can run the 70B version on a single A100 card, and 405B only requires a multi-card cluster, offering vastly better cost-performance than API calls." Meta also provides 8B and 70B versions to meet different scenario needs.

Various Perspectives

"Llama 3.1 405B is a milestone for open source, proving that community collaboration can match billion-parameter closed-source black boxes." — Hugging Face CEO Clément Delangue posted on X.

The developer community response has been enthusiastic. On X platform, @karpathy (former OpenAI researcher) stated: "This model is already sufficient for commercial use in multilingual and tool-calling scenarios, I've already seen surprises in testing." Another developer @lmstudio shared deployment tutorials, garnering tens of thousands of likes.

Industry experts also gave positive feedback. AI researcher Tim Dettmers commented: "The performance curve of 405B shows that open source is accelerating its catch-up, with training costs reduced to 1/10 of closed-source." But there are also cautious voices, such as an Anthropic researcher pointing out: "While strong, safety alignment and hallucination issues still need community optimization."

Regarding competitors, a Google DeepMind engineer anonymously responded on X: "Open-source competition has intensified innovation, but we will continue to focus on reliability and multimodality."

Impact Analysis

The release of Llama 3.1 405B will profoundly reshape the AI ecosystem. First, it challenges the monopoly of closed-source giants like OpenAI and Anthropic. Free commercial licensing means SMEs can build chatbots, code assistants, and other applications without high API fees, promoting AI accessibility.

Second, multilingual support helps global markets. Chinese developers report significant improvements in the model's Chinese understanding capabilities, potentially accelerating local AI application deployment. Downloads broke records on the first day of release, with Hugging Face data showing total Llama series downloads exceeding 1 billion.

In the long term, open-source models promote transparency and safety review. The community can customize fine-tuning to reduce bias risks. But challenges remain: high-parameter models demand massive computational power, making full training difficult for non-top players; under regulatory pressure, the risk of open-source misuse needs attention.

The economic impact is significant. It's estimated that enterprises deploying Llama 3.1 can save 90% in costs, catalyzing a new wave of AI entrepreneurship. Under the X topic #Llama3.1, startup teams are sharing RAG system and Agent construction cases, foreshadowing an application explosion.

Conclusion

Llama 3.1 405B is not just a technological leap, but a victory for the open-source spirit. It reminds us that the future of AI should belong to all humanity, not just a few companies. Looking ahead, with community iterations, this model may become the new cornerstone of commercial AI. Developers, are you ready to deploy?