Ckan Croissant

Feb 10, 2026 1,453 Views - Read Source MLC

MLC MLCommons CKAN Croissant AI基准 LMSYS

Introduction

MLCommons and LMSYS Org have jointly launched the CKAN Croissant benchmark, a significant advancement in AI model evaluation. This benchmark fully leverages the Croissant v1.0 format to achieve seamless standardization and sharing of model metadata.

Core Technology and Innovation

Croissant Integration: All participating models are packaged using the Croissant format, supporting automatic parsing of input/output specifications, tokenizer configuration, etc.
CKAN Framework: Containerized Knowledge Annotation Network, providing containerized deployment to ensure reproducibility of benchmark tests.
Evaluation Protocol: Combines the Elo Rating (blind user preference) from Chatbot Arena with high-throughput inference from SGLang, covering text generation and multimodal tasks.

Performance Leaderboard Highlights

In the initial tests, GPT-4o topped the leaderboard with an Elo Rating of 1325, followed closely by Claude 3.5 Sonnet (1310). Among open-source models, Llama 3.1 405B achieved 1280, surpassing most closed-source competitors.

Model	Elo Rating	Category
GPT-4o	1325	Closed-source
Llama 3.1 405B	1280	Open-source
Gemini 1.5 Pro	1275	Closed-source

Deployment and Future Outlook

CKAN Croissant supports one-click Docker deployment, allowing developers to quickly participate via ckan-croissant eval --model mymodel. Future plans include incorporating edge device benchmarks and real-time multilingual evaluation to promote sustainable development of the AI ecosystem.

For more details, see original link.

This article is from MLC blog, translated in full by Winzheng (winzheng.com). Click here to view the original When republishing the translation, please credit the source. Thank you!

Ckan Croissant

Introduction

Core Technology and Innovation

Performance Leaderboard Highlights

Deployment and Future Outlook

Related Reviews

MLC Chakra Comes of Age: A Standardized Trace Ecosystem for AI Systems Benchmarking and Co-design

MLC MLCommons Releases MLPerf Mobile v6.0 with New Generative AI Benchmarks for On-Device LLMs

LMSYS Agent-Assisted SGLang Development: An Initial Exploration

MLC The patch model is breaking. AI evaluation needs a new way to disclose what it finds.