GitHunt
PR

prankshaw/Beware-web-scraper

Web Scraping project including; C projects scraper from GitHub , ICC rankings scraper, YouTube Trending Scrapper, LinkedIn Profile Scraper, Wikipedia Image Scraper

Visit The project here Contributions Welcome

https://prankshaw.github.io/Beware-web-scraper/

Build Status
Documentation Status
Code style: black
codecov
License: MIT
Issues Open
Forks
Stars
Twitter URL

Scrapers available

    C-project-scraper

    Scrapes the top projects for 'C' language from github. It can be extended to get projects in any language present on GitHub.

    ICC Rankings-Scraper

    Tells about top 100 ranked batsmen from all over the world for all 3 formats, i.e. Test cricket, One day International and T20 International.

    Scrapes all the information from trending section of youtune, including video name, description available and video liks

    LinkedIn-Scraper

    Automatically LogIn to the profile and scrapes the relavant information from profile, including name, location, title, connections and more

    Wikipedia Image-Scraper

    Scrapes links of all the images present in the given wikipedia page and prints them


These project use selenium driver.

To use project

Just fork the project and the install the prerequisities.

Simply run, if present in jupyter notebook, else follow below mentioned steps.

Python (I am using Python 3.x). After downloading python, pip all the requirements(if any).

Selenium Webdriver for Google Chrome: Chromedriver – Download it and place it anywhere on your machine.

pip install selenium

pip install pandas


Change path of 'chromedriver' with your own path.

Just run in IDLE and see the output

License

Licensed under MIT-license
https://prankshaw.mit-license.org/