Alibaba's Open-Source Qwen2 Model Outperforms Llama3 on Multiple Benchmarks, Bilingual Capabilities Spark Community Buzz

Alibaba Cloud officially released the Qwen2 series of open-source large models in June 2024, with Qwen2-72B-Instruct surpassing Meta's Llama3-70B-Instruct on multiple authoritative benchmarks, achieving an MMLU score of 84.2%. The series' breakthrough in Chinese-English bilingual capabilities has caused a sensation in the open-source community, with reposts on X platform's Chinese community quickly exceeding 30,000.

News Lead

In June 2024 Beijing time, Alibaba Cloud officially released the Tongyi Qianwen Qwen2 series of open-source large models, with the Qwen2-72B-Instruct version surpassing Meta's Llama3-70B-Instruct model on multiple authoritative benchmark tests, achieving an MMLU score of 84.2%. This breakthrough not only demonstrates top-tier performance in Chinese-English bilingual capabilities but has also caused a sensation in the open-source community, with reposts in the Chinese community on X platform (formerly Twitter) quickly exceeding 30,000. The series covers parameter scales from 0.5B to 72B, supports free commercial use, further igniting global AI open-source competition.

Background

Tongyi Qianwen (Qwen) is Alibaba Cloud's self-developed multimodal large model series. Since its initial release in 2023, it has iterated to version Qwen1.5 and accumulated widespread influence in the open-source community. The Qwen series emphasizes efficient training and multilingual support, especially optimized for Chinese, with cumulative downloads exceeding ten million. This Qwen2 release comes shortly after Meta Llama3's emergence, which quickly topped multiple benchmark rankings with its 70B parameter scale and open-source strategy, becoming a benchmark for open-source AI.

The open-source large model wave has become a focal point of global AI competition since the rise of Bloom and Stable Diffusion in 2022. Meta has consolidated its open-source leadership through the Llama series, while Chinese companies such as Alibaba, Baidu, and DeepSeek are targeting both domestic and international markets with high-performance, multilingual models. The launch of Qwen2 is Alibaba Cloud's latest effort in this track, aimed at challenging Llama3's dominance.

Core Content: Performance Dominance and Technical Highlights

The Qwen2 series includes seven model scales, from Qwen2-0.5B to Qwen2-72B, trained on over 7 trillion tokens, supporting 32K context length. The flagship Qwen2-72B-Instruct leads Llama3-70B-Instruct on multiple metrics on the Hugging Face Open LLM Leaderboard.

Specifically, in the MMLU (Massive Multitask Language Understanding) benchmark, Qwen2-72B scores 84.2%, higher than Llama3-70B's 82.0%; GPQA (Graduate-level Question Answering) scores 59.2% vs 51.1%; LiveCodeBench (code generation) reaches 30.5% vs 16.8%. Particularly in Chinese-English bilingual tasks, Qwen2-CMM (multilingual mathematics) scores 92.7%, far exceeding Llama3, demonstrating deep optimization for Chinese.

Technical highlights include: architecture adopting Group Query Attention (GQA) and SWA (Sliding Window Attention) to improve inference efficiency; training employing Post-Training Alignment to reduce hallucination issues; Apache 2.0 licensing supporting free commercial use without additional authorization. This is similar to Llama3's open-source strategy, but Qwen2 excels in parameter efficiency—the 72B model can run efficiently on consumer-grade GPUs.

Alibaba Cloud officially states that Qwen2, based on 7 trillion token pre-training, has enhanced long-context understanding and tool-calling capabilities. Hugging Face data shows record-breaking downloads on the first day, with ModelScope platform stars exceeding 20,000.

Various Perspectives

The open-source community has responded enthusiastically. X user @AI_Weekly reposted: "Qwen2 directly slaps Llama3 in the face, crushing it in Chinese-English bilingual capabilities, Alibaba's open-source is too fierce!" with reposts exceeding 15,000. Another AI practitioner @TechInsightCN stated: "MMLU 84.2% is not bragging, actual testing shows faster code generation speed and lower commercial barriers."

"The release of Qwen2 marks a new phase for China's open-source AI, with bilingual capabilities as the biggest highlight, accelerating model deployment in Southeast Asian markets."—Zhou Jingren, Head of Alibaba Cloud AI Lab (cited from official blog)

Meta has not yet directly responded, but insiders in the open-source community point out that Llama3 faces greater pressure after going open-source. Hugging Face CEO Clem Delangue commented on X: "Competition drives progress, Qwen2's benchmark data is impressive, looking forward to more innovation." Domestic experts like Tsinghua University Professor Sun Fuchun believe: "Qwen2 proves the catch-up speed of Chinese teams in foundation models, but we need to be vigilant about data privacy and ethical challenges."

There are also critical voices, with some questioning the fairness of benchmark tests: "Llama3 still has advantages in English tasks, Qwen2's Chinese emphasis may sacrifice generalizability." However, overall sentiment is positive, with GitHub repository stars breaking 50,000 in one day.

Impact Analysis

Qwen2's strong debut has profound implications for the open-source AI ecosystem. First, parameter efficiency and free commercial use lower enterprise barriers, promoting AI democratization. Small and medium enterprises can quickly integrate Qwen2 to develop chatbots and code assistants, facilitating digital transformation.

Second, in China's AI overseas strategy, Qwen2 plays a key role. Its top-tier Chinese-English bilingual capabilities are suitable for Belt and Road countries, filling the Chinese gap in English models. Compared to Llama3's Euro-American orientation, Qwen2 has greater global adaptability and may reshape the open-source landscape.

For Meta, open-source pressure increases. After Llama3's release, it intended to consolidate its lead, but Qwen2's benchmark superiority forces Meta to accelerate Llama4 iteration. Meanwhile, intensified China-US AI open-source competition may spawn more high-performance models, benefiting developers.

In the long run, Qwen2 strengthens China's voice in the AI supply chain. Alibaba Cloud has built a complete ecosystem through the ModelScope platform, with download users covering 200 countries worldwide. This not only enhances Alibaba's brand but also drives chip and computing power demand, stimulating domestic AI hardware development.

Conclusion

The release of Alibaba's Qwen2 is not simply a performance competition, but a signal flare for a new era of open-source AI. It proves that Chinese innovation forces are transitioning from followers to leaders. In the future, with more benchmark validations and actual deployments, Qwen2 may stand alongside Llama3, driving the industry toward more efficient and diversified directions. The enthusiasm of the open-source community foreshadows that AI competition will become more intense and more inclusive. Stay tuned for subsequent updates.