AR
arv-anshul/hockey
Scrape hockey data using scrapy with pydantic validation. Data available on Kaggle too.
Scrape Hockey Data
This project is used to scrape data related to Hockey from altiusrt.com. I mainly focused on
hockeyindia.altiusrt.com because I am interested in Hockey India League (HIL).
For now, scraper is able scrape following data:
- Competitions: Details about previous, upcoming and inprogress competitions. Competitions are like a
tournament (eg. Hockey India League). - Competition Teams: Details about teams participated in the competition.
- Competition Matches: Details about specified competition's matches.
- Competition Players: Details about players who will be playing the competition.
- Competition Matches (detailed): A full detailed data around the match like umpires, players who goal, quater-wise
data and more.
You can use altiusrt/main.py to scrape the data related to a specific competition (eg.
HIL) and export them into json and jsonl (aka jsonlines) data format.
uv run python -m src.altiusrt.main 180In above command,
180is thecompetition_idfor Hockey India League competition/tournament.
Dataset on Kaggle
I have scraped data related to HIL 2025 and uploaded
on Kaggle, you can use that to create an awesome
dashboard out of it.
Acknowledgment
- Took help from
@Martijn-van-Kekem-Development/hockey-match-calendar
repo for scraping codes like CSS selector, URL formation and more.