GitHunt
DV

DVDAGames/kn1ght

Research on building a model that only understands Chess notation.

kn1ght

A work in progress...

Tokenizer Differences

kn1ght's tokenizer is optimized for Chess' Portable Game Notation (PGN) format.

tokenizer comparison

Note: kn1ght's tokenizer does not currently account for PGN metadata (Event, Site, Date, etc.), PGN comments ({...}), notes about clock times ({[%clk ...]}), or other miscellaneous PGN data. It only focuses on the actual moves played in the game.

It has been trained on a small dataset of 3.5M chess games from ChessDB cleaned up by Kaggle user milesh1.

Languages

Jupyter Notebook99.5%Python0.5%

Contributors

Created January 16, 2025
Updated January 28, 2025
DVDAGames/kn1ght | GitHunt