GitHunt
ZO

Zohaib-Azeem/Data-Cleaning-Project

This project cleans and prepares a layoffs dataset using SQL by removing duplicates, standardizing text, handling null/blank values, and dropping unnecessary columns. The result is a consistent, high-quality dataset ready for reporting, analysis, and visualization.

Data-Cleaning-Project

This project cleans and prepares a layoffs dataset using SQL by removing duplicates, standardizing text, handling null/blank values, and dropping unnecessary columns. The result is a consistent, high-quality dataset ready for reporting, analysis, and visualization.
README Data Cleaning.md

๐Ÿ—ƒ๏ธ Data Cleaning Project (SQL)

๐Ÿ“Œ Project Overview

This project demonstrates data cleaning techniques using SQL on a layoffs dataset.
The goal is to transform raw, messy data into a clean and usable form for reporting and analysis.

Key objectives of this project:

  • Remove duplicate records.
  • Standardize text data (e.g., fixing inconsistent company/industry names).
  • Handle null and blank values.
  • Convert data types (especially dates).
  • Drop unnecessary columns.

By the end of this process, we have a cleaned table (layoffs_staging2) ready for further analysis and reporting.


๐Ÿ—‚๏ธ Files in this Repository

  • Data Cleaning project.sql โ†’ The SQL script containing all steps of the data cleaning process.

โš™๏ธ Tools & Technologies

  • Microsoft SQL Server (T-SQL)
  • Database: layoffs

๐Ÿ”‘ Steps in Data Cleaning

1. Remove Duplicates

  • Used CTE (Common Table Expression) with ROW_NUMBER() to identify duplicate rows based on company, industry, location, date, etc.
  • Deleted rows where row_num > 1.

2. Standardize Data

  • Trimmed extra spaces in company names.
  • Corrected inconsistent industry names (e.g., "crypto currency" โ†’ "crypto").
  • Fixed country names with trailing spaces or punctuation (e.g., "United States." โ†’ "United States").

3. Handle Null & Blank Values

  • Removed rows with invalid or blank values (e.g., when industry contained dates instead of industry names).
  • Ensured fields like date are in valid formats.

4. Convert Dates

  • Used TRY_CONVERT(DATE, [date], 101) to change date strings into proper DATE format.

5. Drop Unnecessary Columns

  • Removed helper columns like row_num after data cleaning was complete.

๐ŸŽฏ Key Learnings

  • How to use CTEs and window functions for duplicate removal.
  • Practical text standardization techniques with TRIM() and UPDATE.
  • Best practices for handling missing or invalid data.
  • Converting and validating date fields in SQL.
  • Structuring a data cleaning workflow in SQL.