GitHunt
AR

arv-anshul/hockey

Scrape hockey data using scrapy with pydantic validation. Data available on Kaggle too.

Scrape Hockey Data

Scrapy Pydantic Kaggle

This project is used to scrape data related to Hockey from altiusrt.com. I mainly focused on
hockeyindia.altiusrt.com because I am interested in Hockey India League (HIL).

For now, scraper is able scrape following data:

  1. Competitions: Details about previous, upcoming and inprogress competitions. Competitions are like a
    tournament (eg. Hockey India League).
  2. Competition Teams: Details about teams participated in the competition.
  3. Competition Matches: Details about specified competition's matches.
  4. Competition Players: Details about players who will be playing the competition.
  5. Competition Matches (detailed): A full detailed data around the match like umpires, players who goal, quater-wise
    data and more.

You can use altiusrt/main.py to scrape the data related to a specific competition (eg.
HIL) and export them into json and jsonl (aka jsonlines) data format.

uv run python -m src.altiusrt.main 180

In above command, 180 is the competition_id for Hockey India League competition/tournament.

Dataset on Kaggle

I have scraped data related to HIL 2025 and uploaded
on Kaggle, you can use that to create an awesome
dashboard out of it.

Acknowledgment

Languages

Python100.0%

Contributors

MIT License
Created January 22, 2025
Updated November 25, 2025
arv-anshul/hockey | GitHunt