API Documentation
Access YZ Index leaderboard and change data via API
Overview
YZ Index provides a RESTful JSON API. All endpoints accept GET requests, require no authentication, and support cross-origin access (CORS). Response data is encoded in UTF-8.
- Base URL:https://www.winzheng.com/yz-index/api/
- Response Format:application/json; charset=utf-8
- All endpoints return "ok": true on success. On failure, they return "ok": false along with an "error" field.
Leaderboard Data
Retrieve model leaderboard data for a given dimension. By default, returns the latest published evaluation (full run) overall ranking.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| dimension | string | Optional | Sort dimension. Accepted values:execution_raw grounding_raw core_overall_display value stability. Default core_overall_display。 Legacy values coding/knowledge/longctx/overall still work, deprecated after 2026-06-30 |
| run_id | int | Optional | Evaluation run ID. If omitted, the latest published run is used. |
Response Example
This Week's Changes
Retrieve model ranking change data for a given week. Returns three groups of models (rising, falling, stable) with change magnitudes.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| week | string | Optional | Week tag, format 2026-W12. If omitted, returns the latest week. |
Response Example
Specific Dimension and Run
By combining dimension and run_id parameters, retrieve the leaderboard for a specific evaluation run and dimension. Ideal for historical data comparison and in-depth dimension analysis.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| dimension | string | Required | Sort dimension; in this example coding, results are sorted by Code Execution score in descending order |
| run_id | int | Required | Evaluation run ID; in this example 16 |
Response Example
Error Handling
When a server-side error occurs, the HTTP status code is 500 and the response has the following structure:
When the requested dimension parameter is not in the allowed list, it automatically falls back to overall; when no evaluation data is available, an empty rankings array is returned instead of an error.
API v1(Recommended)
A new public read-only API. No API Key required, CORS enabled, rate-limited to 60 requests per IP per minute. All responses include an attribution field and 1-hour cache.
Base URL:https://www.winzheng.com/yz-index/api/v1/
v1: Leaderboard
Retrieve the overall leaderboard with ranking changes. Sorted by core_overall_display by default.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| dimension | string | Optional | Sort dimension:core_overall_display execution_raw grounding_raw. Default core_overall_display。 Legacy values overall/coding/knowledge/longctx still work, deprecated after 2026-06-30 |
| limit | int | Optional | Number of models to return, 1-50. Default 11 (all). |
Response Example
v1: Changes and Incidents
Retrieve the latest changes and incident data. Filter by model slug.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Optional | Model slug, e.g. deepseek-v3. If omitted, returns all models. |
Response Example
v1: Model Profile
Retrieve a model's detailed profile: scores, dimensions, pricing, and last 5 evaluation history entries. Does not return raw questions and answers.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| {slug} | string | Required | Model slug, e.g. claude-opus-4.6 or deepseek-v3 |
Response Example
v1 General Specification
- Rate Limit:60 requests per IP per minute; exceeding the limit returns 429 Too Many Requests
- CORS:Access-Control-Allow-Origin: *
- Cache:Cache-Control: public, max-age=3600(1 hour)
- No API Key required, direct GET requests
- All responses include an attribution field. Please retain the source when citing data.
- Error Response Format:{"status":"error","error":"..."}
v6 Scoring Field Reference
v6 introduces an entirely new scoring dimension system. Below are the new fields and their meanings.
New Fields (v6)
| Field | Type | Description |
|---|---|---|
| execution_raw | number | Code Execution raw score (0-100) |
| grounding_raw | number | Grounding raw score (0-100) |
| judgment_raw | number | Engineering Judgment raw score (0-100, side-panel AI-assisted evaluation) |
| communication_raw | number | Task Communication raw score (0-100, side-panel AI-assisted evaluation) |
| integrity_raw | number | Integrity Rating raw score (0-100) |
| integrity_label | string | Integrity label (pass/warn/fail) |
| recommendation_status | string | Recommendation status (recommended/neutral/not_recommended) |
| core_overall_raw | number | Overall raw score = 0.55 x execution + 0.45 x grounding |
| core_overall_display | number | Overall display score (capped at 74 on integrity fail) |
v5 Compatibility Fields (sunset after 2026-06-30)
| Field | Status | Description |
|---|---|---|
| coding | deprecated · sunset 2026-06-30 | Coding score (legacy), migrate to execution_raw |
| knowledge | deprecated · sunset 2026-06-30 | Knowledge score (v5), v6 split into multiple dimensions |
| longctx | deprecated · sunset 2026-06-30 | Long context score (v5), migrate to grounding_raw |
| overall | deprecated · sunset 2026-06-30 | Overall score (legacy), migrate to core_overall_display |
| official_coding | deprecated · sunset 2026-06-30 | Official coding score (legacy), migrate to execution_raw |
| official_knowledge | deprecated · sunset 2026-06-30 | Official knowledge score (v5), v6 split into multiple dimensions |
| official_longctx | deprecated · sunset 2026-06-30 | Official grounding score (v5), migrate to grounding_raw |
| official_overall | deprecated · sunset 2026-06-30 | Official overall score (legacy), migrate to core_overall_display |
| shadow_* | deprecated · sunset 2026-06-30 | All shadow_ prefixed fields are deprecated |
Field Disambiguation
integrity_label vs integrity_raw: integrity_label is a tier label (pass/warn/fail), integrity_raw is the 0-100 raw score. Use label for business logic, raw for trend analysis.
core_overall_display vs core_overall_raw: display is the frontend score (capped at 74 on integrity fail), raw is the uncapped weighted score. Sort by display.
Widget Embed Components
Embed YZ Index on your website with a single line of code. Supports leaderboard, model badge, and change bulletin widgets in dark/light themes.
Widget: Leaderboard Card
Displays Top N model rankings, scores, and ranking changes.
Embed Code
Configuration Attributes
| Attribute | Description | Default Value |
|---|---|---|
| data-type | leaderboard | — |
| data-limit | Display model count | 5 |
| data-theme | dark or light | dark |
Live Preview
Widget: Model Badge
A compact badge widget showing the model name, overall score, and ranking. Similar to a GitHub stars badge.
Embed Code
Configuration Attributes
| Attribute | Description |
|---|---|
| data-type | badge |
| data-model | Model slug (required), e.g. deepseek-v3, claude-opus-4.6 |
Live Preview
Widget: Change Bulletin
Displays the models with the largest gains and drops this period, plus incident count.
Embed Code
Live Preview
WDCD Endpoints (Experimental)
WDCD (Winzheng Dynamic Contextual Decay) related endpoints. Experimental stage, interfaces may change.
GET /yz-index/api/v1/dcd
WDCD leaderboard with 3-round scores and main leaderboard comparison.
| Parameter | Description |
|---|---|
run_id | Optional. Evaluation run ID; defaults to latest published run |
format | Optional. json (default) or csv |
GET /yz-index/api/v1/dcd/runs
WDCD evaluation history (last 50 runs), including participating models and average scores.
GET /yz-index/api/v1/dcd/cases
Constraint violation case collection with dialogue summaries and scoring details.
| Parameter | Description |
|---|---|
subtype | Optional. Filter by scenario: data_boundary / resource_limit / business_rule / security / engineering |
model | Optional. Filter by model slug |
limit | Optional. 1-100, default 20 |
GET /yz-index/api/v1/dcd/decay
3-round constraint retention curve: pass rates and decay coefficients from R1→R2→R3 per model.
GET /yz-index/api/v1/dcd/matrix
5-category constraint scenario score matrix with average scores and hardest/easiest scenarios.
GET /yz-index/api/v1/dcd/models/{slug}
Single model WDCD details: current scores, scenario performance, historical trends.
| Parameter | Description |
|---|---|
slug | Required. Model slug (path parameter or ?slug= query parameter) |
Available Model Slugs
| Model Name | slug | Provider |
|---|---|---|
| Claude Opus 4.7 | claude-opus-4.7 | claude |
| Claude Sonnet 4.6 | claude-sonnet-4.6 | claude |
| GPT-5.5 | gpt-5.5 | gpt |
| GPT-o3 | gpt-o3 | gpt |
| Grok 4 | grok-4 | grok |
| Gemini 3.1 Pro | gemini-3.1-pro | gemini |
| Gemini 2.5 Pro | gemini-2.5-pro | gemini |
| DeepSeek V4 Pro | deepseek-v4-pro | deepseek |
| Qwen3 Max | qwen3-max | qwen |
| 豆包 Pro | doubao-pro | doubao |
| 文心一言 4.5 | ernie-4.5 | ernie |