analyst-amitbisht/ydata-profiling
This repository showcases my learning process of automating EDA using 'ydata-profiling'
Automating EDA with ydata-profiling
This repository demonstrates how to automate Exploratory Data Analysis (EDA) using the ydata-profiling library (formerly known as pandas-profiling). It simplifies the process of generating a comprehensive EDA report, saving time and ensuring a thorough analysis.
๐ Features of ydata-profiling
The tool provides the following capabilities:
- Type Inference: Automatically detects data types (Categorical, Numerical, Date, etc.).
- Warnings: Identifies data challenges like missing values, inaccuracies, skewness, and more.
- Univariate Analysis: Generates descriptive statistics (mean, median, mode, etc.) and visualizations like histograms.
- Multivariate Analysis: Includes correlation analysis, missing data summaries, duplicate rows detection, and pairwise variable interactions.
- Time-Series Analysis: Provides insights such as auto-correlation, seasonality, and ACF/PACF plots.
- Text Analysis: Detects most common categories, scripts, and blocks (e.g., Latin, ASCII).
- File & Image Analysis: Reviews file sizes, creation dates, dimensions, and EXIF metadata.
- Dataset Comparison: Quickly compares datasets in one line of code.
- Flexible Output Formats: Reports can be exported as:
- HTML: Easily shareable interactive reports
- JSON: Suitable for automation systems
- Jupyter Notebook Widgets
๐ Project Structure
data/: Contains sample datasets used for demonstration.notebooks/: Jupyter Notebooks showcasing how to useydata-profiling.output/: Stores generated EDA reports.
๐ ๏ธ Getting Started
For Pre-requisites & Running Code, Refer: https://github.com/ydataai/ydata-profiling
๐ Sample Output
The output/ folder contains example reports generated with ydata-profiling.
Reports include:
Data summary (missing values, duplicates, etc.)
Visualizations (correlations, distributions, etc.)
Detailed variable analysis
๐ฅ Credits: Big thanks to https://www.youtube.com/@CodeWithHarry for his excellent tutorial https://www.youtube.com/watch?v=sGQfiyXOvF0&t=1136s on pandas profiling, which inspired this project.
๐ค Contributing: Contributions are welcome! If you have suggestions, feel free to open an issue or submit a pull request.
๐ License: This project is licensed under the MIT License.
๐ฌ Feedback: If you find this project helpful or have any questions, feel free to reach out!