GitHunt
GA

gaspa93/tripadvisor-scraper

Tripadvisor reviews scraper

TripAdvisor Reviews Scraper

Scraper of Tripadvisor reviews, parametric by date and language.
The script allows to scrape:

  1. urls of TA points of interests based on string query
  2. POIs metadata
  3. POIs reviews up to a certain minimum date and with a specified language

Installation

Follow these steps to use the scraper:

  • Download Chromedrive from here.

  • Install Python packages from requirements file, either using pip, conda or virtualenv:

      `conda create --name scraping python=3.6 --file requirements.txt`
    

Note: Python >= 3.6 is required.

Usage

The scraper has 5 parameters:

  • --i: input file, containing a list of Tripadvisor urls that point to first page of reviews.
  • --lang: language code to filter reviews.
    Note: only "select all languages" click is implemented.
  • --N: number of reviews to scrape.
  • --q: string query to scrape url places.
  • --place: boolean value to scrape place metadata instead of reviews.

Some examples:

  • python scraper.py --q amsterdam: generates the urls.txt file with the top-30 POIs of amsterdam
  • python scraper.py --place 1: generates a csv file containing metadata of places present in urls.txt
  • python scraper.py: generates a csv file containing reviews of places present in urls.txt

The config.json file allows to set the directory to store output csv, as well as their filenames.

License

GNU GPLv3

Languages

Python100.0%

Contributors

Created February 28, 2019
Updated November 17, 2024