Published

Chess Multiverse Error & Evaluation Dataset (CMEED v1.0)

Open-source chess research dataset containing 994,269 engine-annotated inaccuracies, mistakes, and blunders from 140,662 broadcast games (Jan–May 2026). www.chessmultiverse.org

Chess Multiverse Error & Evaluation Dataset (CMEED v1.0)

DOI Dataset Format License Records

πŸš€ Live Interactive Explorer

Explore CMEED through the Chess Multiverse Error Explorer:

https://www.chessmultiverse.org/p/chess-multiverse-error-explorer.html

The explorer enables interactive investigation of:

  • Error distributions
  • Opening risk profiles
  • Player error fingerprints
  • Tournament pressure effects
  • Position-level error records
  • Large-scale chess analytics

The CMEED dataset powers research tools developed through Chess Multiverse.


πŸ“Š Dataset Overview

The Chess Multiverse Error & Evaluation Dataset (CMEED) is a large-scale open-source chess research dataset containing structured human decision-making errors extracted from official chess broadcasts.

Unlike traditional chess databases that primarily focus on games, openings, or engine evaluations, CMEED focuses specifically on player mistakes and decision quality.

Each record captures a single:

  • Inaccuracy
  • Mistake
  • Blunder

along with:

  • Board position before the move
  • Board position after the move
  • Engine evaluation changes
  • Remaining clock time
  • Player ratings
  • Player titles
  • Opening metadata
  • Tournament metadata
  • Full FEN reconstruction

Version 1.0 contains data extracted from official broadcast events spanning January 2026 through May 2026.


πŸ“š Source Data & Provenance

CMEED is derived from the official Lichess Broadcast Database.

Source broadcasts are distributed under the Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0) license.

The source PGNs contain:

  • Engine evaluations
  • Clock information
  • Opening metadata
  • Tournament metadata
  • Player metadata

CMEED transforms these raw broadcast PGNs into a structured research dataset focused on human error behavior.

Transformation pipeline:

Broadcast PGN
    ↓
Evaluation Parsing
    ↓
Error Detection
    ↓
Position Reconstruction
    ↓
Metadata Enrichment
    ↓
CMEED Dataset

πŸ“ˆ Dataset Statistics

MetricValue
Dataset VersionCMEED v1.0
Coverage PeriodJanuary 2026 – May 2026
Games106,911
Total Error Records994,269
Unique Players32,203
Opening Families489
Inaccuracies566,830
Mistakes195,775
Blunders231,664
Storage Size~1.33 GB
FormatJSON
LicenseCC BY-SA 4.0

πŸ“¦ Monthly Dataset Breakdown

MonthError Records
January 2026186,544
February 2026130,945
March 2026205,046
April 2026201,910
May 2026269,824
Total994,269

πŸ“ Repository Structure

Chess-Multiverse-Error-Evaluation-Dataset-CMEED/

β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ cmeed_2026-01.json
β”‚   β”œβ”€β”€ cmeed_2026-02.json
β”‚   β”œβ”€β”€ cmeed_2026-03.json
β”‚   β”œβ”€β”€ cmeed_2026-04.json
β”‚   └── cmeed_2026-05.json
β”‚
β”œβ”€β”€ cmeed_v1.parquet
β”œβ”€β”€ README.md
β”œβ”€β”€ CITATION.cff
β”œβ”€β”€ LICENSE
β”œβ”€β”€ CONTRIBUTING.md
β”œβ”€β”€ CODE_OF_CONDUCT.md
└── .gitattributes

Dataset Formats

CMEED is distributed in two formats:

FormatDescription
JSONMonthly source datasets
ParquetConsolidated research dataset

The Parquet release contains all 994,269 error records in a compressed columnar format optimized for analytics, machine learning, and large-scale research workflows.

Researchers are encouraged to use the Parquet release for maximum performance.


πŸ—„οΈ Data Schema (Data Dictionary)

Each JSON object represents a single detected player error.

FieldTypeDescription
error_idStringUnique error identifier
game_idStringUnique game identifier
source_fileStringSource dataset file
eventStringTournament name
broadcast_nameStringBroadcast title
game_urlStringOriginal game URL
ecoStringECO code
openingStringOpening name
dateStringGame date
yearIntegerYear
roundStringTournament round
boardStringBoard number
whiteStringWhite player
blackStringBlack player
playerStringPlayer committing the error
white_eloIntegerWhite rating
black_eloIntegerBlack rating
player_eloIntegerPlayer rating
white_titleStringWhite title
black_titleStringBlack title
player_fide_idStringFIDE identifier
resultStringGame result
time_controlStringTime control
move_numberIntegerMove number
error_plyIntegerPly number
sideStringWhite or Black
opening_phaseStringOpening, Middlegame, Endgame
error_typeStringInaccuracy, Mistake, Blunder
played_moveStringMove played
best_moveStringEngine recommendation
eval_beforeFloatEvaluation before move
eval_afterFloatEvaluation after move
eval_changeFloatEvaluation swing
clock_secondsIntegerRemaining clock time
fen_beforeStringPosition before move
fen_afterStringPosition after move

πŸ§ͺ Methodology

1. Data Collection

Games were collected from official broadcast PGN archives.

2. Error Extraction

Custom CMEED extraction software identifies:

  • Inaccuracies
  • Mistakes
  • Blunders

from engine annotations embedded within broadcast PGNs.

3. Position Reconstruction

Every game is replayed using python-chess to reconstruct:

  • FEN before move
  • FEN after move

for each detected error.

4. Evaluation Tracking

For every error record:

  • eval_before
  • eval_after
  • eval_change

are extracted and reconstructed from engine evaluations embedded in PGN annotations.

5. Metadata Enrichment

Each record is enriched with:

  • Tournament metadata
  • Player metadata
  • Rating information
  • Opening information
  • Clock information
  • Game identifiers

πŸ” Reproducibility

CMEED was generated using the Chess Multiverse Error Extraction Pipeline.

Core technologies:

  • Python
  • python-chess
  • PGN parsing
  • JSON serialization

Pipeline:

PGN Import
    ↓
Evaluation Parsing
    ↓
Error Detection
    ↓
FEN Reconstruction
    ↓
Metadata Enrichment
    ↓
JSON Export

πŸ’» Example Usage

Load Dataset

import json
import pandas as pd

with open("data/cmeed_2026-05.json", "r", encoding="utf-8") as f:
    data = json.load(f)

df = pd.DataFrame(data)

print(df.head())

Player Error Analysis

carlsen = df[df["player"] == "Carlsen, Magnus"]

print(carlsen["error_type"].value_counts())

Largest Blunders

blunders = df[df["error_type"] == "Blunder"]

largest = blunders.sort_values(
    by="eval_change",
    ascending=False
)

print(largest.head())

Opening Error Analysis

opening_errors = (
    df.groupby("opening")
      .size()
      .sort_values(ascending=False)
)

print(opening_errors.head(20))

πŸ”¬ Research Applications

CMEED enables research in:

  • Human Error Modeling
  • Chess Performance Analytics
  • Opening Risk Assessment
  • Time Pressure Studies
  • Decision-Making Research
  • Elo-Based Error Prediction
  • Endgame Error Analysis
  • Tournament-Level Statistical Research
  • Cognitive Science
  • Sports Analytics
  • Artificial Intelligence
  • Machine Learning
  • Reinforcement Learning
  • Explainable AI
  • Human-Computer Interaction
  • Chess Education

⚠️ Limitations

  • Coverage is limited to official broadcast events available through the source archive.
  • Error detection depends on engine evaluations embedded within source PGNs.
  • CMEED focuses on inaccuracies, mistakes, and blunders rather than every move played.
  • Version 1.0 covers January 2026 through May 2026 only.
  • Additional tournaments and historical years may be added in future releases.

🀝 Contributing

Contributions are welcome.

Please read:

  • CONTRIBUTING.md
  • CODE_OF_CONDUCT.md

before submitting issues or pull requests.

Potential areas include:

  • Additional years
  • Additional tournaments
  • Validation tooling
  • Data quality improvements
  • Research notebooks
  • Visualization tools
  • Position classification systems

πŸ“ Citation

If you use CMEED in research, publications, software, educational projects, or derivative datasets, please cite the dataset.

Varshney, Sparsh. (2026). Chess Multiverse Error & Evaluation Dataset (CMEED v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20625716

DOI

DOI

DOI: https://doi.org/10.5281/zenodo.20625716


πŸ‘¨β€πŸ”¬ Lead Researcher & Principal Developer

Sparsh Varshney

Founder, Chess Multiverse

Research Interests:

  • Chess Analytics
  • Human Error Modeling
  • Open Data Science
  • Artificial Intelligence
  • Medical Research
  • Open Science

Projects

Chess Multiverse

https://www.chessmultiverse.org

Amidha Ayurveda

https://www.amidhaayurveda.com

Profiles

GitHub:

https://github.com/sciencewithsaucee-sudo

ORCID:

https://orcid.org/0009-0004-7835-0673


βš–οΈ License

This dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).

You are free to:

  • Share
  • Adapt
  • Build upon the dataset

for any purpose, including commercial use, provided appropriate attribution is given and derivative works are distributed under the same license.

See the LICENSE file for full details.