API Documentation
Access YZ Index leaderboard and change data via API
Overview
YZ Index provides a RESTful JSON API. All endpoints accept GET requests, require no authentication, and support cross-origin access (CORS). Response data is encoded in UTF-8.
- Base URL:https://www.winzheng.com/yz-index/api/
- Response Format:application/json; charset=utf-8
- All endpoints return "ok": true on success. On failure, they return "ok": false along with an "error" field.
Leaderboard Data
Retrieve model leaderboard data for a given dimension. By default, returns the latest published evaluation (full run) overall ranking.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| dimension | string | Optional | Sort dimension. Accepted values:execution_raw grounding_raw core_overall_display value stability. Default core_overall_display。 Legacy values coding/knowledge/longctx/overall still work, deprecated after 2026-06-30 |
| run_id | int | Optional | Evaluation run ID. If omitted, the latest published run is used. |
Response Example
This Week's Changes
Retrieve model ranking change data for a given week. Returns three groups of models (rising, falling, stable) with change magnitudes.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| week | string | Optional | Week tag, format 2026-W12. If omitted, returns the latest week. |
Response Example
Specific Dimension and Run
By combining dimension and run_id parameters, retrieve the leaderboard for a specific evaluation run and dimension. Ideal for historical data comparison and in-depth dimension analysis.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| dimension | string | Required | Sort dimension; in this example coding, results are sorted by Code Execution score in descending order |
| run_id | int | Required | Evaluation run ID; in this example 16 |
Response Example
Error Handling
When a server-side error occurs, the HTTP status code is 500 and the response has the following structure:
When the requested dimension parameter is not in the allowed list, it automatically falls back to overall; when no evaluation data is available, an empty rankings array is returned instead of an error.
API v1(Recommended)
A new public read-only API. No API Key required, CORS enabled, rate-limited to 60 requests per IP per minute. All responses include an attribution field and 1-hour cache.
Base URL:https://www.winzheng.com/yz-index/api/v1/
v1: Leaderboard
Retrieve the overall leaderboard with ranking changes. Sorted by core_overall_display by default.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| dimension | string | Optional | Sort dimension:core_overall_display execution_raw grounding_raw. Default core_overall_display。 Legacy values overall/coding/knowledge/longctx still work, deprecated after 2026-06-30 |
| limit | int | Optional | Number of models to return, 1-50. Default 11 (all). |
Response Example
v1: Changes and Incidents
Retrieve the latest changes and incident data. Filter by model slug.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Optional | Model slug, e.g. deepseek-v3. If omitted, returns all models. |
Response Example
v1: Model Profile
Retrieve a model's detailed profile: scores, dimensions, pricing, and last 5 evaluation history entries. Does not return raw questions and answers.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| {slug} | string | Required | Model slug, e.g. claude-opus-4.6 or deepseek-v3 |
Response Example
v1 General Specification
- Rate Limit:60 requests per IP per minute; exceeding the limit returns 429 Too Many Requests
- CORS:Access-Control-Allow-Origin: *
- Cache:Cache-Control: public, max-age=3600(1 hour)
- No API Key required, direct GET requests
- All responses include an attribution field. Please retain the source when citing data.
- Error Response Format:{"status":"error","error":"..."}
v6 Scoring Field Reference
v6 introduces an entirely new scoring dimension system. Below are the new fields and their meanings.
New Fields (v6)
| Field | Type | Description |
|---|---|---|
| execution_raw | number | Code Execution raw score (0-100) |
| grounding_raw | number | Grounding raw score (0-100) |
| judgment_raw | number | Engineering Judgment raw score (0-100, side-panel AI-assisted evaluation) |
| communication_raw | number | Task Communication raw score (0-100, side-panel AI-assisted evaluation) |
| integrity_raw | number | Integrity Rating raw score (0-100) |
| integrity_label | string | Integrity label (pass/warn/fail) |
| recommendation_status | string | Recommendation status (recommended/neutral/not_recommended) |
| core_overall_raw | number | Overall raw score = 0.55 x execution + 0.45 x grounding |
| core_overall_display | number | Overall display score (capped at 74 on integrity fail) |
v5 Compatibility Fields (sunset after 2026-06-30)
| Field | Status | Description |
|---|---|---|
| coding | deprecated · sunset 2026-06-30 | Coding score (legacy), migrate to execution_raw |
| knowledge | deprecated · sunset 2026-06-30 | Knowledge score (v5), v6 split into multiple dimensions |
| longctx | deprecated · sunset 2026-06-30 | Long context score (v5), migrate to grounding_raw |
| overall | deprecated · sunset 2026-06-30 | Overall score (legacy), migrate to core_overall_display |
| official_coding | deprecated · sunset 2026-06-30 | Official coding score (legacy), migrate to execution_raw |
| official_knowledge | deprecated · sunset 2026-06-30 | Official knowledge score (v5), v6 split into multiple dimensions |
| official_longctx | deprecated · sunset 2026-06-30 | Official grounding score (v5), migrate to grounding_raw |
| official_overall | deprecated · sunset 2026-06-30 | Official overall score (legacy), migrate to core_overall_display |
| shadow_* | deprecated · sunset 2026-06-30 | All shadow_ prefixed fields are deprecated |
Field Disambiguation
integrity_label vs integrity_raw: integrity_label is a tier label (pass/warn/fail), integrity_raw is the 0-100 raw score. Use label for business logic, raw for trend analysis.
core_overall_display vs core_overall_raw: display is the frontend score (capped at 74 on integrity fail), raw is the uncapped weighted score. Sort by display.
Widget Embed Components
Embed YZ Index on your website with a single line of code. Supports leaderboard, model badge, and change bulletin widgets in dark/light themes.
Widget: Leaderboard Card
Displays Top N model rankings, scores, and ranking changes.
Embed Code
Configuration Attributes
| Attribute | Description | Default Value |
|---|---|---|
| data-type | leaderboard | — |
| data-limit | Display model count | 5 |
| data-theme | dark or light | dark |
Live Preview
Widget: Model Badge
A compact badge widget showing the model name, overall score, and ranking. Similar to a GitHub stars badge.
Embed Code
Configuration Attributes
| Attribute | Description |
|---|---|
| data-type | badge |
| data-model | Model slug (required), e.g. deepseek-v3, claude-opus-4.6 |
Live Preview
Widget: Change Bulletin
Displays the models with the largest gains and drops this period, plus incident count.
Embed Code
Live Preview
Available Model Slugs
| Model Name | slug | Provider |
|---|---|---|
| Claude Opus 4.6 | claude-opus-4.6 | claude |
| Claude Sonnet 4.6 | claude-sonnet-4.6 | claude |
| GPT-4o | gpt-4o | gpt |
| GPT-o3 | gpt-o3 | gpt |
| Grok 3 | grok-3 | grok |
| Gemini 2.5 Pro | gemini-2.5-pro | gemini |
| DeepSeek V3 | deepseek-v3 | deepseek |
| DeepSeek R1 | deepseek-r1 | deepseek |
| Qwen Max | qwen-max | qwen |
| 豆包 Pro | doubao-pro | doubao |
| 文心一言 4.0 | ernie-4 | ernie |