BE
behnamyazdan/PythonForDataEngineeringCourse
This course is designed to provide learners with the fundamental skills needed for data engineering using Python. The objective is to introduce anyone interested in the topic to Python's data engineering-related features.
Python for Data Engineering Course
01: Python Basics
- Introduction
- First Python program
- Basic Data types (int, float, str, bool)
- Variables, Constants and operators
- Control flow (if statements, loops)
- Data structures (lists, tuples, dictionaries, sets)
- String manipulation
- Modular Programming (introduction)
- Error handling
- Assignments: Python basics assignments
02: Data Manipulation, Analysis and Visualization
- Introduction to Data Analysis
- Setting Up Jupyter Notebook and Pandas
- Data Manipulation with Pandas
- Introduction to Pandas
- Basics statistics in data analysis (descriptive statistics, shape of distribution, inferential statistics)
- Pandas Series
- Pandas DataFrame
- Reading and writing data with Pandas
- Accessing and selecting data
- Data manipulation (filtering, sorting, grouping)
- Data Cleaning
- Merge, Join and Concatenating
- Numerical computations with NumPy
- Data visualization Basics
- Assignment 1 : Analyzing E-commerce Orders using Pandas
- Assignment 2 : Exploratory Data Analysis (EDA) using Pandas, NumPy, and Matplotlib
04: Web Scraping and APIs
- Introduction to Web Scraping and APIs
- Basics of HTML, CSS, and JavaScript
- Getting Started with Web Scraping Using Python
- Dealing with Pagination and Infinite Scrolling
- Working with APIs
- Assignment 1: Scraping Jobinja website, fetch last year jobs information and save into a CSV file.
05: Working with Data Sources, Storages, and Serialization
- Introduction to I/O (Input/Output)
- Importance of File I/O (Input/Output)
- Basic Concepts of File I/O
- Text Files
- CSV Files
- JSON (JavaScript Object Notation) Files
- XML Files
- Excel Files
- Binary Files
- Database File Formats (SQLite)
- Parquet and pyArrow
06: Object-Oriented Programming
- Introduction to Object-Oriented Paradigm
- Object-OrientedAnalysis(OOA)
- Object-Oriented Design (OOD)
- Introduction to Object-Oriented Programming (OOP)
- Object-Oriented Design Principles
07: SQL & NoSQL Databases with Python
Working with SQL Databases
- Introduction to SQL and relational databases
- SQL basics (SELECT, FROM, WHERE, JOIN)
- Creating and managing databases, tables, and indexes
- CRUD operations (Create, Read, Update, Delete)
- Connecting to databases
- Executing SQL queries
- Fetching and manipulating data with SQL
- Using SQLAlchemy for database interaction
Working with NoSQL Databases
- Understanding NoSQL databases (e.g., MongoDB, Redis)
- Connecting to NoSQL databases
- Querying and manipulating data in NoSQL databases
- Handling document-based and key-value data models
Assignment: Ass7
08: Building Data Pipelines
Data Pipelines
- ETL (Extract, Transform, Load)
- Understanding data pipelines and their components
- Designing and architecting data pipelines
- Implementing data ingestion, transformation, and loading (ETL)
Assignment: Ass8
Capstone Project
Project Development
- Apply all the concepts learned in a real-world data engineering project
- Work with various data sources including web data and APIs
- Implement ETL pipelines, data processing, and analysis using Python libraries and tools