GitHunt
TR

trucoshan/shanthedataartist

Every empire stands on grains of dust. This is a personal collection of my projects. Do let me know if you find it useful.

shanthedataartist

Every empire stands on grains of dust.

This is A personal Collection of my projects.


Restaurant Analytics (Level 1 and 2) [Cognifyz Internship]

Task – Restaurant Data Analysis & Business Intelligence

For this task, I analyzed a global restaurant dataset to uncover cuisine trends, customer preferences, and chain performance. Using Python (Pandas for wrangling, Matplotlib/Seaborn for visualization, and Folium/Plotly for mapping), I turned raw data into actionable business insights.

πŸ”Ή Steps Taken

β€’ Data Wrangling: Cleaned and structured dataset with data dictionary and handling of chains.
β€’ Feature Engineering: Created Weighted Rating Numerators and applied Bayesian Weighted Rating logic to filter out small-sample bias.
β€’ Cuisine Analysis: Identified most common and popular cuisine combinations by frequency and demand.
β€’ Chain Analysis: Evaluated 734 restaurant chains across rating tiers (top-rated, average, low-rated).
β€’ Geographic Mapping: Visualized restaurant density and hotspots across countries using interactive maps.
β€’ BI Framing: Converted statistical insights into business intelligence recommendations (expansion, pricing, delivery strategy, and brand benchmarking).

πŸ“Š Key Insights Identified

βœ” Most ratings fall between 3.5–4.5, with ~202 votes on average β€” social proof drives trust.
βœ” North Indian + Chinese is the most frequent combo, while Fast Food + Healthy/American combos dominate popularity.
βœ” Premium cuisines (Sushi, Burgers, Continental-Italian) consistently achieve weighted ratings β‰₯4.
βœ” Big QSR chains (McDonald’s, Subway, KFC) cluster in the β€œaverage-rated” range despite high reach.
βœ” Barbeque Nation and Absolute Barbecues emerge as high-rated, high-volume leaders.
βœ” Domino’s shows strong reach but low satisfaction (avg. rating 3.4) β†’ brand risk.

πŸ”‘ Takeaways

β€’ Expansion β†’ Focus on Delhi NCR & Indian metros; test suburban growth in US East Coast.
β€’ Cuisine Mix β†’ Strengthen Indo-Chinese, Mughlai, Fast Food; scale premium Sushi/Continental niches.
β€’ Pricing β†’ Stay affordable (Price Range 1 & 2 dominate).
β€’ Delivery β†’ Huge opportunity but fix quality gaps (packaging, timeliness).
β€’ Chains β†’ QSR giants need menu/service innovation; premium chains should scale aggressively.

✨ Learning Outcome

β€’ Enhanced my ability to apply Bayesian weighted logic in real-world datasets.
β€’ Learned to combine statistical rigor with business intelligence framing.
β€’ Gained insights into how data analytics can guide menu design, expansion strategy, and customer experience.


πŸ“¦ FedEx Exploratory Data Analysis (EDA) Project

This project analyzes 10,324 FedEx shipment records to uncover patterns in delivery delays, shipment costs, fulfillment methods, and vendor contracts.
The aim is to demonstrate how structured data cleaning, transformation, and visualization can translate raw logistics data into meaningful business insights.


πŸš€ Project Objectives

  • Identify the factors driving on-time vs delayed deliveries.
  • Explore the relationship between freight costs, shipment weights, and modes of transport.
  • Compare fulfillment methods (Direct Drop vs RDC) and INCO terms.
  • Provide actionable business takeaways for cost optimization and reliability improvement.

πŸ› οΈ Tech Stack & Tools

  • Python Libraries: Pandas, NumPy, Matplotlib, Seaborn, Plotly, Folium
  • Data Cleaning: Regex, datetime parsing, missing-value handling, outlier treatment
  • Visualization: Static plots (Matplotlib/Seaborn), interactive charts (Plotly), geospatial maps (Folium/Choropleth)
  • Data Size: 10,324 rows Γ— 33 columns

πŸ“‚ Project Workflow

  1. Data Import & Setup

    • Loaded CSV dataset into Pandas DataFrame.
    • Inspected shape, column types, null values, and duplicates.
  2. Data Wrangling

    • Standardized column names.
    • Converted date fields to datetime.
    • Resolved messy text fields (Weight, Freight Cost) into numeric features.
    • Engineered new columns:
      • Delivery_Delay (days early/late vs schedule)
      • Delivery_Status (Early, On-Time, Late)
  3. Exploratory Data Analysis

    • Univariate: Histograms, boxplots, violin plots to study delay distributions and outliers.
    • Bivariate/Multivariate: Scatterplots (weight vs cost, cost vs delay), barplots (fulfillment vs delay), pie charts (management concentration).
    • Geospatial: Choropleth map showing % of late deliveries by country.
  4. Insights & Business Takeaways

    • Direct Drop = most reliable; RDC adds variability.
    • Air = lowest delay % (9%), Trucks = highest (16%).
    • Ocean = cheapest for heavy loads, but delays common.
    • Freight cost does not guarantee on-time delivery.
    • Outliers (very high weight/cost) distort analysis and require special handling.

πŸ“Š Key Findings

  • Air transport dominates (61% shipments) and is most reliable.
  • Truck shipments suffer the highest variability in delays.
  • Ocean shipments are cost-efficient but risk-prone.
  • EXW Incoterm emerges as the most reliable contract type.
  • RDC operations show high unpredictability, needing optimization.

πŸ”Ž Sample Visuals

Late Deliveries Map


πŸ“Œ Business Recommendations

  • Use Air for time-critical shipments despite higher costs.
  • Optimize RDC workflows to reduce both early and late deliveries.
  • Reserve Air Charter for emergencies (too costly otherwise).
  • Employ Ocean for bulk, non-urgent cargo.
  • Track outlier shipments (extreme cost/weight) as risk factors.

πŸ“– Learning Outcomes

This project reflects my Month 3 journey into Data Science, showcasing skills in:

  • Data Wrangling with Pandas & Regex
  • Feature Engineering for new insights
  • EDA & Visualization (Matplotlib, Seaborn, Plotly, Folium)
  • Business Storytelling – turning raw data into actionable strategy

πŸ“ Acknowledgements

This dataset and project form part of my AlmaBetter Data Science program.
Grateful to AlmaBetter, Amity Online, and Woolf University for structured learning and support.


trucoshan/shanthedataartist | GitHunt