GitHunt
ER

ericblam/competition-scraper

Program to scrape and collect data from ballroom competitions.

Ballroom Competition Scraper

This system is meant to gather data from websites such as O2CM about ballroom competitions for analyses of results and placements.

Make sure that you have the following packages installed:

  • libpq-dev
  • postgresql
  • python3-tidylib

Make sure that you have the following python packages installed:

  • urllib3
  • pytidylib
  • bs4
  • psycopg2

Additionally, you will need a database running with the configurations specified in src/config.json and the specified database loaded with the SQL files in src/db.

The repository is broken up into several packages:

  • .
    • crawler.py
      • Script that creates threads that run the webparsing framework
  • db
    • Accessors and objects for competition database objects
  • test
    • unittest package for testing code in the repository
  • util
  • webparser
    • single-page parsers, and other webparsing framework code

To run the crawler, you simply would need to run:

$ python3 crawler.py --numWorkers <number-of-threads> --configFile <config-file-path> http://results.o2cm.com

Languages

Python94.4%PLpgSQL4.5%Shell1.1%

Contributors

MIT License
Created June 30, 2017
Updated August 4, 2025
ericblam/competition-scraper | GitHunt