485 results for “topic:data-cleaning-and-preprocessing”
This exploratory data analysis (EDA) project focuses on examining sugarcane production data. Through this analysis, we seek to gain valuable insights into factors influencing sugarcane production, develop predictive models for future yields, and ultimately support efforts to optimize production efficiency and sustainability.
Black Friday Sales Analysis explores customer demographics, purchasing behaviors, and product trends to uncover insights and patterns driving sales during Black Friday events.
Leveraging advanced data cleaning techniques and feature engineering, a robust food delivery prediction model was developed using regression algorithms.
Welcome to my data science repository! Here you will find a collection of resources and examples for exploring, analyzing, and manipulating data using Python. The repository includes code templates, case studies, and exercises to help you learn and practice data science concepts and techniques. The topics covered include data exploration, data visu
This is an AI model for predicting laptop price, trained on about 1200 data.
A repository where I keep all of my data cleaning samples/portfolio items.
📊 An analysis of voting patterns in São Paulo's 2024 elections, focusing on voter behavior, absenteeism, and geographic trends.
A project analyzing the Indian startup ecosystem between 2018 and 2021.
Developed a 3-page Power BI dashboard (global and Asian overview) using Python scripts to load and clean World Bank data (1960–2020), reducing data processing time by 25\%. and Containerized the database in Docker, enabling scalable access, and visualized trends (e.g., 3\% annual GDP growth in Asia), enhancing stakeholder insights.
The Ethiopian Medical Business Data Warehouse & Analytics Platform is a comprehensive data solution tailored to enhance the efficiency and efficacy of Ethiopia's healthcare and medical sectors.
Global Superstore BI Dashboard
CleanEasy is a powerful, user-friendly Python library designed to simplify data cleaning and preprocessing for data scientists and analysts
Power Outage Data Analysis in USA
EDA and Prediction of F1 Race WInners
Data Analyst | Python | SQL | Power BI | R | Excel | PySpark | EDA | ETL | Data Visualization | Statistical Analysis | Data Wrangling | Data Modeling | MongoDB | Machine Learning | Deployment | GitHub | AWS
This project involves analyzing real-world medical appointment data through Time Series Analysis. The tasks include dataset cleaning, comprehensive analysis, and extracting insights using Python and MySQL.
Designed and implemented machine learning and deep learning models to diagnose gearbox faults. Preprocessed sensor data, engineered features, and trained models using techniques like SVM, random forests, LSTM and naive bias. Evaluated model performance and optimized hyperparameters to achieve high diagnostic accuracy.
This project analyses transportation data from the Bureau of Transportation Statistics (BTS) to uncover insights into cross-border freight's efficiency, safety and environmental impacts across road, rail, air and water modes.
Scripts for cleaning, converting, and managing image datasets for ML training. (Zsh/Python)
Project aims to forecast potato prices in India using LSTM, KNN, and Random Forest Regression, integrating historical data on prices, regional stats, and rainfall patterns. Targeting agricultural stakeholders for informed decision-making.
"Predicting a Greener Future 🌾📊 Delve into the world of agriculture and data science with our Yield Prediction project. We harness machine learning and weather data to forecast crop yields accurately. Join us in cultivating smarter farming practices for a sustainable tomorrow."
A Python library and its cli for converting grabcraft to schema (more specifically litematica schematic) files
Smart Student Performance Prediction App using ML and Django A web platform that predicts student outcomes using academic and behavioral data. It features data cleaning, EDA, feature engineering, and a Random Forest model. Includes dashboards for students, teachers, and admins with personalized stats, alerts, and PDF reports.
Tableau dashboard development from TSQL EDA scripts in order to help clients understand about the context of UK road accident in 2021 and 2022 to minimize the loss of lives.
A data analysis project that explores Zomato’s restaurant dataset using Python to uncover trends in ratings, pricing, and service features such as online ordering and table booking.
This repository contains a beginner-friendly introduction to Machine Learning, covering essential concepts such as data preprocessing, feature engineering, data visualization, and ML fundamentals.
Comprehensive object detection using YOLOv5, trained from scratch. Includes data preparation, YOLOv5 training on 20 labels, and testing on images/videos. Utilizes Google Colab's V100 GPU for robust detection.
A comprehensive Data analysis project using SQL for data cleaning and pre-processing and Tableau for visualization, focusing on key HR KPIs. Features interactive dashboards and detailed insights.
Tokenizer for Indonesian language data cleaning.
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.