DeepSeek-V2 Outperforms Llama3 in Chinese Benchmarks at 1/10 the Cost, Sparking Heated Discussions

Feb 3, 2026 419 approx.5min Grok/X

DeepSeek-V2 中文AI 开源模型 AI技术突破性价比神话

News Lead

Recently, the DeepSeek team released their heavyweight V2 open-source large language model, demonstrating excellent performance across multiple Chinese benchmarks, particularly surpassing Meta's Llama3 in mathematics and code generation tasks while requiring only 1/10 of the training cost. This breakthrough quickly went viral in the Chinese community on X platform, with related discussions exceeding 500,000 views, as netizens enthusiastically discussed China's AI "cost-performance miracle" writing a new chapter.

Background Introduction

DeepSeek, as a representative project in China's open-source AI field, is developed by the Deep Seek team and has been known for efficient training and Chinese optimization since its V1 version. Unlike international giants that rely on massive data accumulation, DeepSeek focuses on algorithmic innovation and efficient resource utilization. As the global AI competition intensifies, open-source models have become a key path to lowering barriers and promoting inclusive AI. Llama3, as Meta's latest masterpiece, leads in English benchmarks with its 70B parameter scale, but its Chinese capabilities lag relatively behind, providing an opportunity for local models to overtake on the curve.

The release of DeepSeek-V2 comes at a time of intensifying China-US AI competition, with Chinese teams not only accumulating community feedback through open-source strategies but also gaining a foothold in the global AI ecosystem. This V2 upgrade, reaching 236B parameters (Mixture of Experts MoE architecture), achieves a performance leap at extremely low cost, marking China's AI transition from "following" to "leading".

Core Content: Technical Breakthrough Details

The core highlight of DeepSeek-V2 lies in its outstanding performance on Chinese-specific tasks. According to official benchmark tests, it scored 89.5% on GSM8K-Math (Chinese mathematical reasoning), surpassing Llama3's 85.2%; on the Chinese subset of LiveCodeBench (code generation), it achieved 78.3% accuracy, leading Llama3 by approximately 5 percentage points. Additionally, V2 consistently ranks at the top in local benchmarks such as C-Eval (Chinese comprehensive ability).

Cost control is another killer feature. DeepSeek revealed that V2's training consumed only about 2.78 million H800 GPU hours, costing less than 1/10 of Llama3. This efficiency stems from multi-level optimization: first, the MLA (Multi-head Latent Attention) mechanism significantly reduces KV cache overhead; second, the MoE architecture activates only 16B parameters to respond to inference requests, doubling inference speed; third, meticulous cleaning of local datasets with enhanced training for Chinese corpora avoids bias from English data dominance.

The open-source strategy further amplifies advantages. V2's complete weights are publicly available on Hugging Face and GitHub, supporting business-friendly licensing. Developers report low deployment barriers, with a consumer-grade RTX 4090 capable of running quantized versions, far superior to the API dependence of closed-source models.

Various Perspectives: Industry Discussions and Community Feedback

"DeepSeek-V2 proves that Chinese AI can lead without burning money. The dual-drive of MoE+MLA is an engineering miracle." — X user @AI Frontline Observer (post with over 100,000 views)

X platform data shows that the #DeepSeekV2 topic generated over 50,000 interactions within 24 hours, with Chinese AI practitioners expressing widespread approval. Renowned AI blogger @Silicon Valley Li Xiang commented: "This isn't minor tinkering, but a systematic breakthrough. Costing 1/10 yet surpassing Llama3 tells the world: open-source + local optimization is the future."

International perspectives are equally positive. Hugging Face CEO Clément Delangue posted praise: "DeepSeek-V2's efficiency is impressive; it will accelerate global MoE model standardization." Domestic experts like Tsinghua University Professor Sun Fuquan stated: "Leading in Chinese benchmarks fills the gap left by international models, helping AI benefit local applications."

Of course, skeptical voices exist. Some netizens point out that V2 still has gaps in long-context English tasks, with Llama3 scoring higher overall. DeepSeek responded that V2 focuses on Chinese vertical domains and will iterate on general capabilities in the future.

Impact Analysis: Local Innovation and Global Landscape

DeepSeek-V2's release injects a booster shot into China's AI ecosystem. First, the cost-performance miracle continues, allowing SMEs to access top-tier models without huge investments, promoting applications like text-to-image and code assistants. Second, it stimulates local innovation enthusiasm. In X discussions, multiple developers shared experiences fine-tuning V2 to develop chatbots, launching products in just one week, far exceeding the cycle of training from scratch.

Globally, this move challenges the dominance of open-source leaders. While the Llama series is powerful, high costs limit its spread; DeepSeek counters with low barriers, potentially attracting more Asian developers and forming a "Chinese AI island." In the long term, it validates the universality of MoE architecture, potentially catalyzing the next wave of model arms race.

At the policy level, the Chinese government encourages open-source AI, and this model may receive more resource allocation. Meanwhile, data security issues emerge: local optimization relies on Chinese corpora, and balancing privacy with performance becomes a test.

Conclusion

DeepSeek-V2 is more than a technical leap; it's a victory for the open-source spirit. It efficiently proves that Chinese AI is transitioning from quantitative to qualitative change. In the future, with community collaboration, V2 is expected to dominate more benchmarks, driving AI to truly "Deep Seek" — deeply exploring human intelligence. Industry insiders predict that 2024 will see a surge in local open-source models, warranting continued attention.