GitHunt
N3

PySpark tutorial

To simulate Big Data workflow I installed a VM on my local computer, Spark and configured PySpark to work with Jupyter Notebook. For different setup scenarios, check the course Spark and Python for Big Data with PySpark.

The notebook includes:

  • DataFrame basics
  • DataFrame operations
  • DataFrame aggregation
  • missing data
  • dates and timestamps

Dataset

The datasets used (people.json,appl_stock.csv,sales_info.csv and ContainsNull.csv) could be downloaded form the repository.

Python version

Python 2

Wrote during participation in the course Spark and Python for Big Data with PySpark

Languages

Jupyter Notebook100.0%

Contributors

Created August 2, 2019
Updated August 2, 2019