GitHunt
RU

rutvikbarbhai/Data-Cleaning-Using-Ms-Excel

Hands-On Data Cleaning, Standardization & Statistical Analysis in Microsoft Excel Using a Real-World Internship Eligibility Dataset

Data Cleaning & Analysis Using MS Excel Functions

Hands-On No-Code Data Cleaning, Standardization & Statistical Analysis Using Microsoft Excel on a Real-World Internship Eligibility Dataset

Project Overview

This repository demonstrates end-to-end data cleaning and analysis using MS Excel on a structured student internship eligibility dataset.
The focus areas include:

  • Cleaning inconsistent department names
  • Standardizing categorical values
  • Handling percentages and scores
  • Computing weighted scores
  • Determining eligibility status
  • Generating statistical insights

This project reflects industry-style Excel usage commonly required in operations, analytics, and product roles.

Dataset File

Excel Workbook (Cleaned & Processed)

Click on the button below to download the cleaned dataset

Dataset Download

๐Ÿ“Š Student Internship Eligibility Dataset (Sample Preview)

Student ID Name Department Std. Dept T1 T2 T3 Attendance Application Date Internship Pref Assignment % Weighted Score Status
S001 Aadhya Jain CSE Computer Science 75 59 65 77% 2025-08-24 Company C 90% 64.40 Eligible
S002 Myra Chatterjee Comp Sci Computer Science 66 75 70 86% 2025-06-08 Company B 82% 71.15 Eligible
S003 Diya Gupta CS Computer Science 98 67 82 91% 2025-08-24 Company D 79% 79.15 Eligible
S004 Sai Agarwal Computer Science Computer Science 96 69 76 99% 2025-07-27 Company B 84% 76.55 Eligible
S005 Anika Mukherjee ECE Electronics 78 91 96 81% 2025-09-15 Company A 73% 91.55 Eligible

Data Cleaning & Standardization

Application_Raw Spreadsheet

๐Ÿ”น Department Standardization

To handle inconsistent department naming (CSE, CS, Comp Sci, etc.), a lookup-driven standardization was applied.

Formula used:

=VLOOKUP(C2, Dept_Lookup!$A$2:$B$17, 2, FALSE)
  • Ensures consistent department naming
  • Prevents analytical duplication
  • Enables accurate aggregation & reporting

Eligibility Status Logic

Eligibility was calculated using attendance + assignment thresholds, dynamically driven by company-specific criteria.
Formula used:

=IF(
  AND(
    H3 >= (VLOOKUP(J3, Interview_Slots!$A$2:$D$5, 4, FALSE) / 100),
    L3 >= VLOOKUP(J3, Interview_Slots!$A$2:$D$5, 3, FALSE)
  ),
  "Eligible",
  "Not Eligible"
)
  • Company-wise rules
  • No hardcoded values
  • Scalable to additional companies

Allocation Spreadsheet

๐Ÿ”น Rank Calculation (Within Department)

Students were ranked department-wise based on weighted score.
Formula used:

=IFERROR(
  RANK(E3, FILTER($E$2:$E$1000, $D$2:$D$1000 = D3), 0),
  98
)
  • Fair intra-department ranking
  • Dynamic filtering
  • Error-handled for edge cases

๐Ÿ”นAllocation Status

Final allocation is determined based on rank vs available interview slots.
Formula used:

=IF(
  G5 <= VLOOKUP(D5, Interview_Slots!$A$2:$B$5, 2, FALSE),
  "Allocated",
  "Not Allocated"
)
  • Slot-aware allocation
  • Realistic hiring workflow simulation

Data Visualization & Insight Dashboard

  • This section highlights the visual analytics layer of the project, designed to transform cleaned operational data into actionable insights for decision-making teams.

๐Ÿ“Š Chart 1: Best Performing Departments by Average Score

Data Flow:

application_raw โ†’ Pivot Table โ†’ Bar Chart
image
  • Compares average scores across standardized departments
  • Highlights top-performing academic departments
  • Includes a Grand Total benchmark for overall comparison

Chart 2: Company Preference Distribution

Data Flow:

application_raw โ†’ Pivot Table โ†’ Pie Chart
  • Visualizes student internship preferences across companies
  • Identifies the most preferred recruiters
  • Helps operations teams plan interview slots & capacity
image

๐Ÿ“Š Key Insights from the Dashboard

  • Computer Science consistently shows strong academic performance
  • Company B receives the highest preference share among applicants
  • Allocation logic ensures fair, rank-based selection
  • Dataset and dashboard are fully scalable for larger cohorts

Skills Demonstrated

  • Advanced MS Excel formulas
  • Lookup-driven standardization
  • Conditional logic & ranking systems
  • Operations-style allocation frameworks
  • Dashboard creation & data storytelling
  • Data cleaning & analytics best practices