Skip to main content
赢政天下
YZ Index News Winzheng Lab WDCD
Subscribe
中文 English 日本語
All Original Global Reviews
All 人工智能(268) OpenAI(257) Anthropic(175) AI代理(116) AI安全(113) AI伦理(86) 生成式AI(68) xAI(67) Meta(62) 谷歌(47) LMSYS(47) 网络安全(47) AI(45) 数据中心(45) ChatGPT(45) MLC(44) 五角大楼(44) Claude(43) AI技术(42) AI监管(42) 融资(42)

AI Model Time Zone Reasoning Comparison: Details Determine Success

Eight leading AI models showed clear capability divisions when tested on a seemingly simple time zone conversion question, with 5 models performing perfectly while 3 made calculation errors.

赢政指数 模型横评 时区推理
404 03-20

AI Models Show Clear Divide in Logical Reasoning: Half Fall into Reasoning Traps

In a seemingly simple logical reasoning test, 8 mainstream AI models demonstrated starkly different performances with only a 50% success rate, exposing significant disparities in current AI's logical reasoning capabilities.

赢政指数 模型横评 逻辑推理
315 03-20

YZ Index Weekly Report: Collective Decline in Knowledge Work Capabilities, Claude Remains Stable Against the Trend

This week's YZ Index evaluation reveals a rare collective decline in knowledge work capabilities across AI models, with 6 out of 8 mainstream models showing performance degradation. Claude Sonnet 4.6 emerges as the only model with positive growth.

赢政指数 周报 AI评测
245 03-20

GPT-o3 Knowledge Work Score Plummets 12 Points: Logical Reasoning Ability Suspected to Have Degraded

GPT-o3 experienced a rare cliff-like drop in the knowledge work dimension this week, plunging from 82.4 to 70.3 points, with logical reasoning and translation tasks showing significant deterioration.

赢政指数 AI评测 GPT-o3
291 03-20

GPT-o3 Performance Plummets: Technical Concerns Behind 12.1-Point Drop in Knowledge Work Capabilities

GPT-o3 experienced severe performance degradation in knowledge work this week, with scores plunging from 82.4 to 70.3 points, primarily affecting logical reasoning and language comprehension capabilities.

GPT-o3 性能断崖式下跌 AI评测
311 03-20

In-Depth Analysis: From DeepSeek to Gemini, How to Build an Impregnable Defense Against "Model Distillation"?

This article analyzes the DeepSeek model distillation incident and proposes a comprehensive multi-layered defense system against distillation attacks, including API-level controls, output watermarking, and architectural protections.

DeepSeek 模型蒸馏 AI安全
1,423 02-14

KTransformers Accelerates SGLang's Heterogeneous Inference

KTransformers, developed by Tsinghua University's MadSys and Approaching.AI, optimizes CPU/GPU collaborative inference for sparse MoE models through AMX-optimized kernels, efficient device coordination, and expert deferral mechanisms, now integrated into SGLang for enhanced performance.

LMSYS AI技术 混合推理
1,050 02-04

SGLang-Diffusion: Two Months of Progress

SGLang-Diffusion has achieved 2.5x performance improvements since its launch in November 2025, with support for new models, LoRA, parallel processing, and ComfyUI integration.

LMSYS AI技术 深度学习
822 02-04

SGLang Pipeline Parallelism: Million-Token Context Extension and Performance Breakthroughs

SGLang launches a highly optimized Pipeline Parallelism implementation designed for ultra-long context inference challenges. Through integrated optimizations and a clean design, it achieves a 3.31x speedup in prefill throughput for DeepSeek V3 on multi-node H20 clusters, demonstrating strong scalability for trillion-parameter models.

LMSYS SGLang Pipeline Parallelism
778 02-04

FP4 Mixed-Precision Inference Optimization on AMD GPUs

We developed Petit, a collection of FP16/BF16 × FP4 mixed-precision GPU kernels for AMD GPUs, enabling 1.74× faster Llama 3.3 70B inference on existing MI250/MI300 hardware without upgrades.

LMSYS AMD GPU FP4量化
796 02-04

SGLang Achieves Deterministic Inference and Reproducible RL Training

SGLang implements fully deterministic inference with only 34.35% performance overhead and enables 100% reproducible RL training in collaboration with slime, providing reliable solutions for rigorous scientific experiments.

LMSYS SGLang 确定性推理
796 02-04

GB200 NVL72 Deployment DeepSeek Optimization (Part 2): 3.8x Prefill and 4.8x Decode Throughput

The SGLang team shares their optimization progress on DeepSeek V3/R1 inference performance using GB200 NVL72, achieving 26,156 input tokens/s for prefill and 13,386 output tokens/s for decode per NVIDIA Blackwell GPU through techniques like FP8 attention, NVFP4 MoE, and large-scale expert parallelism.

LMSYS SGLang DeepSeek
849 02-04

Partnering with SGLang: Best Practices for Efficiently Deploying DeepSeek-R1 on H20-96G

This article presents comprehensive optimization strategies for deploying DeepSeek-R1 on H20 GPUs, achieving state-of-the-art performance of 16.5k input tokens/s and 5.7k output tokens/s per node through hardware-aware parallelization, kernel optimizations, and advanced scheduling techniques.

LMSYS DeepSeek-R1 H20 GPU
798 02-04

PD-Multiplexing: A New Paradigm for High-Goodput LLM Serving Driven by GreenContext

This article introduces PD-Multiplexing, a new serving paradigm in SGLang that leverages NVIDIA's GreenContext technology to achieve higher goodput for LLM services through efficient intra-GPU resource sharing between prefill and decode phases.

LMSYS PD-Multiplexing GreenContext
715 02-04

SGLang Supports DeepSeek V3.2 Sparse Attention Mechanism from Day 0

SGLang announces Day 0 support for DeepSeek-V3.2, implementing DeepSeek Sparse Attention (DSA) mechanism that significantly improves training and inference efficiency, especially in long-context scenarios.

LMSYS SGLang DeepSeek-V3.2
771 02-04

NVIDIA DGX Spark In-Depth Review: A New Benchmark for Local AI Inference

We conducted an in-depth review of NVIDIA DGX Spark, a compact all-in-one system that brings supercomputing-level performance to desktop workstation form factor. While its unified memory design enables running ultra-large models, performance is constrained by memory bandwidth, making it ideal for prototyping and experimentation rather than production deployment.

LMSYS NVIDIA DGX Spark AI推理
1,913 02-04

SGLang and NVIDIA Partner to Accelerate InferenceMAX Benchmark and GB200 Performance

SGLang collaborates with NVIDIA to leverage Blackwell architecture innovations, achieving breakthrough performance on DeepSeek models with up to 4x improvements, and is selected as the default inference engine for NVIDIA and AMD hardware in the InferenceMAX benchmark.

LMSYS SGLang NVIDIA Blackwell
840 02-04

SGLang-Jax: An Open-Source Tool for Native TPU Inference

We introduce SGLang-Jax, a state-of-the-art open-source inference engine built entirely on Jax and XLA, achieving fast native TPU inference with advanced features like continuous batching, prefix caching, and speculative decoding.

LMSYS SGLang-Jax TPU推理
708 02-04

Optimizing GPT-OSS on NVIDIA DGX Spark: Unleashing Spark's Maximum Potential

We successfully optimized GPT-OSS 20B and 120B models on NVIDIA DGX Spark using SGLang, achieving state-of-the-art performance of ~70 tokens/s and ~50 tokens/s respectively, enabling fully local AI applications including coding agents.

LMSYS NVIDIA DGX Spark GPT-OSS
907 02-04

No Free Lunch: MiniMax M2 Deconstructs Efficient Attention Mechanisms

SGLang announces first-day support for MiniMax M2, a flagship MoE model that returns to full attention after empirical findings show efficient attention methods face significant production deployment challenges.

LMSYS MiniMax M2 高效注意力
756 02-04
2 3 4 5

© 1998-2026 赢政天下 All rights reserved.

Founded in 1998, relaunched in 2025. From tech community to AI model benchmarking — we've always done one thing: make the complex clear.

YZ Index News Winzheng Lab About Us Subscribe Privacy Policy Terms of Service

本评测独立运营,不接受 AI 模型厂商赞助。赢政指数的每一分都是系统跑出来的。

引用格式:赢政指数 (2026). AI 模型综合排名. https://www.winzheng.com/yz-index/

数据授权:CC BY-NC 4.0