Skip to main content
YZ Index

API Documentation

Access YZ Index leaderboard and change data via API

Overview

YZ Index provides a RESTful JSON API. All endpoints accept GET requests, require no authentication, and support cross-origin access (CORS). Response data is encoded in UTF-8.

Leaderboard Data

GET /yz-index/api/rankings

Retrieve model leaderboard data for a given dimension. By default, returns the latest published evaluation (full run) overall ranking.

Parameter Type Required Description
dimension string Optional Sort dimension. Accepted values:execution_raw grounding_raw core_overall_display value stability. Default core_overall_display
Legacy values coding/knowledge/longctx/overall still work, deprecated after 2026-06-30
run_id int Optional Evaluation run ID. If omitted, the latest published run is used.
{ "ok": true, "run_id": 16, "dimension": "core_overall_display", "run": { "id": 16, "run_type": "full", "status": "done", "started_at": "2026-03-16 00:30:00", "finished_at": "2026-03-16 03:45:00", "formula_version": "v3", "judge_set_version": "v5", "benchmark_version": "v6" // ... more run fields }, "rankings": [ { "model_slug": "claude-opus", "model_name": "Claude Opus 4", "model_version": "claude-opus-4-6-20250619", "provider": "anthropic", "execution_raw": 89.5, "grounding_raw": 85.2, "core_overall_display": 82.7, "integrity_label": "pass", "value": 62.3, "stability": 91.0, "availability": 100.0, // deprecated fields (sunset 2026-06-30) "coding": 89.5, "knowledge": 85.2, "longctx": 78.0, "overall": 82.7 } // ... more models ] }

This Week's Changes

GET /yz-index/api/changes

Retrieve model ranking change data for a given week. Returns three groups of models (rising, falling, stable) with change magnitudes.

Parameter Type Required Description
week string Optional Week tag, format 2026-W12. If omitted, returns the latest week.
{ "ok": true, "week": "2026-W12", "weeks": ["2026-W12", "2026-W11", "2026-W10"], "up": [ { "model_slug": "gpt-4o", "model_name": "GPT-4o", "direction": "up", "delta": 3.2, "current_score": 80.5, "previous_score": 77.3, "provider": "openai" // ... more fields } ], "down": [ // ... declining models ], "stable": [ // ... stable models ], "total": 11, "run": { "id": 16, "run_type": "full", "started_at": "2026-03-16 00:30:00", "model_count": 11 // ... more run fields } }

Specific Dimension and Run

GET /yz-index/api/rankings?dimension=execution_raw&run_id=16

By combining dimension and run_id parameters, retrieve the leaderboard for a specific evaluation run and dimension. Ideal for historical data comparison and in-depth dimension analysis.

Parameter Type Required Description
dimension string Required Sort dimension; in this example coding, results are sorted by Code Execution score in descending order
run_id int Required Evaluation run ID; in this example 16
{ "ok": true, "run_id": 16, "dimension": "coding", "run": { "id": 16, "run_type": "full", "status": "done" // ... more run fields }, "rankings": [ { "model_slug": "claude-opus", "model_name": "Claude Opus 4", "coding": 89.5, "knowledge": 85.2, "longctx": 78.0, "value": 62.3, "stability": 91.0, "availability": 100.0, "overall": 82.7 // sorted by coding DESC } // ... more models ] }

Error Handling

When a server-side error occurs, the HTTP status code is 500 and the response has the following structure:

{ "ok": false, "error": "error description" }

When the requested dimension parameter is not in the allowed list, it automatically falls back to overall; when no evaluation data is available, an empty rankings array is returned instead of an error.

API v1(Recommended)

A new public read-only API. No API Key required, CORS enabled, rate-limited to 60 requests per IP per minute. All responses include an attribution field and 1-hour cache.

Base URL:https://www.winzheng.com/yz-index/api/v1/

v1: Leaderboard

GET /yz-index/api/v1/leaderboard

Retrieve the overall leaderboard with ranking changes. Sorted by core_overall_display by default.

ParameterTypeRequiredDescription
dimension stringOptional Sort dimension:core_overall_display execution_raw grounding_raw. Default core_overall_display。
Legacy values overall/coding/knowledge/longctx still work, deprecated after 2026-06-30
limit intOptional Number of models to return, 1-50. Default 11 (all).
{ "status": "ok", "data": [ { "rank": 1, "model_name": "Claude Opus 4.6", "model_slug": "claude-opus-4.6", "score": 82.7, "change": 1.2 } ], "run_id": 37, "run_date": "2026-03-22 06:26:12", "attribution": "Powered by Winzheng Index (winzheng.com)" }

v1: Changes and Incidents

GET /yz-index/api/v1/changes

Retrieve the latest changes and incident data. Filter by model slug.

ParameterTypeRequiredDescription
model stringOptional Model slug, e.g. deepseek-v3. If omitted, returns all models.
{ "status": "ok", "data": { "changes": [ { "model_slug": "gpt-4o", "model_name": "GPT-4o", "dimension": "core_overall_display", "delta": 3.2, "direction": "up", "summary": "execution & grounding both improved" } ], "incidents": [], "run_id": 37, "run_date": "2026-03-22 06:26:12", "engine_version": "v6" }, "attribution": "Powered by Winzheng Index (winzheng.com)" }

v1: Model Profile

GET /yz-index/api/v1/models/{slug}

Retrieve a model's detailed profile: scores, dimensions, pricing, and last 5 evaluation history entries. Does not return raw questions and answers.

ParameterTypeRequiredDescription
{slug} stringRequired Model slug, e.g. claude-opus-4.6 or deepseek-v3
{ "status": "ok", "data": { "name": "Claude Opus 4.6", "slug": "claude-opus-4.6", "provider": "anthropic", "scores": { "execution_raw": 89.5, "grounding_raw": 85.2, "core_overall_display": 82.7, "integrity_label": "pass" }, "dimensions": { "execution_raw": 89.5, "grounding_raw": 85.2, "judgment_raw": 76.8, "communication_raw": 81.3, "value": 62.3, "stability": 91.0, "availability": 100.0 }, "pricing": { "input_cost": 15.0, "output_cost": 75.0 }, "history": [ { "run_id": 37, "run_date": "2026-03-22", "core_overall_display": 82.7 } ] }, "attribution": "Powered by Winzheng Index (winzheng.com)" }

v1 General Specification

v6 Scoring Field Reference

v6 introduces an entirely new scoring dimension system. Below are the new fields and their meanings.

FieldTypeDescription
execution_raw number Code Execution raw score (0-100)
grounding_raw number Grounding raw score (0-100)
judgment_raw number Engineering Judgment raw score (0-100, side-panel AI-assisted evaluation)
communication_raw number Task Communication raw score (0-100, side-panel AI-assisted evaluation)
integrity_raw number Integrity Rating raw score (0-100)
integrity_label string Integrity label (pass/warn/fail)
recommendation_status string Recommendation status (recommended/neutral/not_recommended)
core_overall_raw number Overall raw score = 0.55 x execution + 0.45 x grounding
core_overall_display number Overall display score (capped at 74 on integrity fail)
v5 Compatibility Fields (sunset after 2026-06-30)
FieldStatusDescription
coding deprecated · sunset 2026-06-30 Coding score (legacy), migrate to execution_raw
knowledge deprecated · sunset 2026-06-30 Knowledge score (v5), v6 split into multiple dimensions
longctx deprecated · sunset 2026-06-30 Long context score (v5), migrate to grounding_raw
overall deprecated · sunset 2026-06-30 Overall score (legacy), migrate to core_overall_display
official_coding deprecated · sunset 2026-06-30 Official coding score (legacy), migrate to execution_raw
official_knowledge deprecated · sunset 2026-06-30 Official knowledge score (v5), v6 split into multiple dimensions
official_longctx deprecated · sunset 2026-06-30 Official grounding score (v5), migrate to grounding_raw
official_overall deprecated · sunset 2026-06-30 Official overall score (legacy), migrate to core_overall_display
shadow_* deprecated · sunset 2026-06-30 All shadow_ prefixed fields are deprecated

integrity_label vs integrity_raw: integrity_label is a tier label (pass/warn/fail), integrity_raw is the 0-100 raw score. Use label for business logic, raw for trend analysis.

core_overall_display vs core_overall_raw: display is the frontend score (capped at 74 on integrity fail), raw is the uncapped weighted score. Sort by display.

Widget Embed Components

Embed YZ Index on your website with a single line of code. Supports leaderboard, model badge, and change bulletin widgets in dark/light themes.

Widget: Leaderboard Card

Displays Top N model rankings, scores, and ranking changes.

<script src="https://www.winzheng.com/yz-index/widget.js" data-type="leaderboard" data-limit="5" data-theme="dark"></script>
AttributeDescriptionDefault Value
data-typeleaderboard
data-limitDisplay model count5
data-themedark or lightdark

Widget: Model Badge

A compact badge widget showing the model name, overall score, and ranking. Similar to a GitHub stars badge.

<script src="https://www.winzheng.com/yz-index/widget.js" data-type="badge" data-model="deepseek-v3"></script>
AttributeDescription
data-typebadge
data-modelModel slug (required), e.g. deepseek-v3, claude-opus-4.6

Widget: Change Bulletin

Displays the models with the largest gains and drops this period, plus incident count.

<script src="https://www.winzheng.com/yz-index/widget.js" data-type="changes"></script>

Available Model Slugs

Model NameslugProvider
Claude Opus 4.6 claude-opus-4.6 claude
Claude Sonnet 4.6 claude-sonnet-4.6 claude
GPT-4o gpt-4o gpt
GPT-o3 gpt-o3 gpt
Grok 3 grok-3 grok
Gemini 2.5 Pro gemini-2.5-pro gemini
DeepSeek V3 deepseek-v3 deepseek
DeepSeek R1 deepseek-r1 deepseek
Qwen Max qwen-max qwen
豆包 Pro doubao-pro doubao
文心一言 4.0 ernie-4 ernie