rutvikbarbhai/Data-Cleaning-Using-Ms-Excel
Hands-On Data Cleaning, Standardization & Statistical Analysis in Microsoft Excel Using a Real-World Internship Eligibility Dataset
Data Cleaning & Analysis Using MS Excel Functions
Hands-On No-Code Data Cleaning, Standardization & Statistical Analysis Using Microsoft Excel on a Real-World Internship Eligibility Dataset
Project Overview
This repository demonstrates end-to-end data cleaning and analysis using MS Excel on a structured student internship eligibility dataset.
The focus areas include:
- Cleaning inconsistent department names
- Standardizing categorical values
- Handling percentages and scores
- Computing weighted scores
- Determining eligibility status
- Generating statistical insights
This project reflects industry-style Excel usage commonly required in operations, analytics, and product roles.
Dataset File
Excel Workbook (Cleaned & Processed)
Click on the button below to download the cleaned dataset
๐ Student Internship Eligibility Dataset (Sample Preview)
| Student ID | Name | Department | Std. Dept | T1 | T2 | T3 | Attendance | Application Date | Internship Pref | Assignment % | Weighted Score | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S001 | Aadhya Jain | CSE | Computer Science | 75 | 59 | 65 | 77% | 2025-08-24 | Company C | 90% | 64.40 | Eligible |
| S002 | Myra Chatterjee | Comp Sci | Computer Science | 66 | 75 | 70 | 86% | 2025-06-08 | Company B | 82% | 71.15 | Eligible |
| S003 | Diya Gupta | CS | Computer Science | 98 | 67 | 82 | 91% | 2025-08-24 | Company D | 79% | 79.15 | Eligible |
| S004 | Sai Agarwal | Computer Science | Computer Science | 96 | 69 | 76 | 99% | 2025-07-27 | Company B | 84% | 76.55 | Eligible |
| S005 | Anika Mukherjee | ECE | Electronics | 78 | 91 | 96 | 81% | 2025-09-15 | Company A | 73% | 91.55 | Eligible |
Data Cleaning & Standardization
Application_Raw Spreadsheet
๐น Department Standardization
To handle inconsistent department naming (CSE, CS, Comp Sci, etc.), a lookup-driven standardization was applied.
Formula used:
=VLOOKUP(C2, Dept_Lookup!$A$2:$B$17, 2, FALSE)
- Ensures consistent department naming
- Prevents analytical duplication
- Enables accurate aggregation & reporting
Eligibility Status Logic
Eligibility was calculated using attendance + assignment thresholds, dynamically driven by company-specific criteria.
Formula used:
=IF(
AND(
H3 >= (VLOOKUP(J3, Interview_Slots!$A$2:$D$5, 4, FALSE) / 100),
L3 >= VLOOKUP(J3, Interview_Slots!$A$2:$D$5, 3, FALSE)
),
"Eligible",
"Not Eligible"
)
- Company-wise rules
- No hardcoded values
- Scalable to additional companies
Allocation Spreadsheet
๐น Rank Calculation (Within Department)
Students were ranked department-wise based on weighted score.
Formula used:
=IFERROR(
RANK(E3, FILTER($E$2:$E$1000, $D$2:$D$1000 = D3), 0),
98
)
- Fair intra-department ranking
- Dynamic filtering
- Error-handled for edge cases
๐นAllocation Status
Final allocation is determined based on rank vs available interview slots.
Formula used:
=IF(
G5 <= VLOOKUP(D5, Interview_Slots!$A$2:$B$5, 2, FALSE),
"Allocated",
"Not Allocated"
)
- Slot-aware allocation
- Realistic hiring workflow simulation
Data Visualization & Insight Dashboard
- This section highlights the visual analytics layer of the project, designed to transform cleaned operational data into actionable insights for decision-making teams.
๐ Chart 1: Best Performing Departments by Average Score
application_raw โ Pivot Table โ Bar Chart
- Compares average scores across standardized departments
- Highlights top-performing academic departments
- Includes a Grand Total benchmark for overall comparison
Chart 2: Company Preference Distribution
application_raw โ Pivot Table โ Pie Chart
- Visualizes student internship preferences across companies
- Identifies the most preferred recruiters
- Helps operations teams plan interview slots & capacity
๐ Key Insights from the Dashboard
- Computer Science consistently shows strong academic performance
- Company B receives the highest preference share among applicants
- Allocation logic ensures fair, rank-based selection
- Dataset and dashboard are fully scalable for larger cohorts
Skills Demonstrated
- Advanced MS Excel formulas
- Lookup-driven standardization
- Conditional logic & ranking systems
- Operations-style allocation frameworks
- Dashboard creation & data storytelling
- Data cleaning & analytics best practices