kunwargupta/US-Customer-Insights-Analysis
Statistical analysis of US customer spending behavior and demographic segmentation. Tests real business assumptions using hypothesis testing, EDA, and data-driven insights.
US Customer Insights Analysis
Project Overview
This project focuses on analyzing a customer dataset to understand spending behavior, engagement patterns, and the influence of demographic factors on business outcomes. The objective was to answer real business questions using statistical methods rather than relying on assumptions.
The dataset contains information such as gender, age, education level, marital status, state, monthly spending, number of pets, and days since last interaction.
Motivation
Businesses often segment customers based on demographic characteristics, assuming that factors like age, gender, or location strongly influence purchasing behavior. This project was conducted to test whether those assumptions actually hold true when analyzed statistically.
Workflow & What Was Done
1. Data Exploration
- Examined the structure of the dataset
- Identified data types and key variables
- Checked for missing values and inconsistencies
- Ensured the data was suitable for statistical analysis
At this stage, the main challenge was understanding how each variable could be used to answer meaningful business questions.
2. Exploratory Data Analysis (EDA)
- Visualized distributions of numerical variables
- Explored relationships between demographic factors and spending
- Used plots to detect patterns, trends, and possible anomalies
- Created scatter plots to investigate relationships between variables such as age and inactivity
EDA helped in forming intuition about the data before applying formal statistical tests.
3. Formulating Business Questions
Based on the dataset, several practical questions were defined:
- Do males and females spend differently?
- Does education level affect spending?
- Is marital status related to pet ownership?
- Are older customers less active?
- Do customers from different states spend differently?
Each question was translated into statistical hypotheses.
4. Hypothesis Testing
Appropriate tests were selected based on variable types:
- Independent t-test → Comparing two groups (gender)
- One-way ANOVA → Comparing multiple groups (education, state)
- Chi-square test → Relationship between categorical variables (marital status & pets)
- Pearson correlation → Relationship between two numerical variables (age & inactivity)
Before running parametric tests, assumptions were checked:
- Normality using Shapiro-Wilk test
- Homogeneity of variance using Levene’s test
- Independence based on dataset structure
Challenges Faced & How They Were Solved
Choosing the Correct Statistical Tests
Initially, it was not obvious which test should be used for each question. This required understanding the difference between comparing means, testing relationships, and analyzing categorical associations.
This challenge was addressed by studying the type of variables involved (categorical vs numerical) and selecting tests accordingly.
Understanding and Checking Assumptions
Another difficulty was determining when assumption tests were necessary and how to interpret them.
- Normality tests sometimes indicated non-normality even when distributions appeared reasonable.
- It was important to understand that large samples can still allow parametric tests due to the Central Limit Theorem.
This step helped in learning how real statistical analysis balances theory with practical considerations.
Interpreting Non-Significant Results
Most tests produced non-significant results, which initially seemed like a lack of useful findings.
However, this highlighted an important insight:
Not finding a difference is still a valuable result.
It suggests that demographic variables alone may not explain customer behavior.
Translating Statistics into Business Insights
Converting statistical outcomes into meaningful business recommendations was one of the most challenging parts.
This required moving beyond numbers and considering what the results imply for marketing, segmentation, and decision-making.
Key Findings
- Gender, education level, age, and state did not significantly influence monthly spending.
- Age showed little to no relationship with customer inactivity.
- A significant association was found between marital status and the number of pets owned.
- Spending patterns appear relatively consistent across demographic groups.
Business Implications
The results suggest that:
- Demographic segmentation alone may not be sufficient.
- Behavioral data (purchase history, engagement patterns) is likely more informative.
- Nationwide strategies may be effective due to minimal geographic differences.
- Targeted campaigns based on lifestyle factors may outperform demographic targeting.
Skills Used and Developed
Technical Skills
- Data exploration and cleaning
- Exploratory data analysis (EDA)
- Statistical hypothesis testing
- Assumption testing for parametric methods
- Data visualization
- Interpreting statistical outputs
Analytical Skills
- Translating business questions into statistical problems
- Choosing appropriate analytical methods
- Drawing evidence-based conclusions
- Communicating findings clearly
Tools & Libraries
- Python
- Pandas
- NumPy
- Matplotlib / Seaborn
- SciPy
Key Learnings
This project reinforced several important lessons:
- Data-driven decisions should be based on evidence, not assumptions
- Non-significant results can still provide valuable insights
- Correct test selection is crucial for valid conclusions
- Statistical analysis is only useful when translated into business context
Conclusion
This analysis demonstrates how statistical techniques can be used to evaluate real-world business assumptions about customers. While demographic variables showed limited predictive power for spending behavior, the findings emphasize the importance of behavioral data for customer segmentation and strategy development.
Overall, the project highlights the role of analytics in supporting informed, evidence-based decision-making.