US Customer Insights Analysis

Project Overview

This project focuses on analyzing a customer dataset to understand spending behavior, engagement patterns, and the influence of demographic factors on business outcomes. The objective was to answer real business questions using statistical methods rather than relying on assumptions.

The dataset contains information such as gender, age, education level, marital status, state, monthly spending, number of pets, and days since last interaction.

Motivation

Businesses often segment customers based on demographic characteristics, assuming that factors like age, gender, or location strongly influence purchasing behavior. This project was conducted to test whether those assumptions actually hold true when analyzed statistically.

Workflow & What Was Done

1. Data Exploration

Examined the structure of the dataset
Identified data types and key variables
Checked for missing values and inconsistencies
Ensured the data was suitable for statistical analysis

At this stage, the main challenge was understanding how each variable could be used to answer meaningful business questions.

2. Exploratory Data Analysis (EDA)

Visualized distributions of numerical variables
Explored relationships between demographic factors and spending
Used plots to detect patterns, trends, and possible anomalies
Created scatter plots to investigate relationships between variables such as age and inactivity

EDA helped in forming intuition about the data before applying formal statistical tests.

3. Formulating Business Questions

Based on the dataset, several practical questions were defined:

Do males and females spend differently?
Does education level affect spending?
Is marital status related to pet ownership?
Are older customers less active?
Do customers from different states spend differently?

Each question was translated into statistical hypotheses.

4. Hypothesis Testing

Appropriate tests were selected based on variable types:

Independent t-test → Comparing two groups (gender)
One-way ANOVA → Comparing multiple groups (education, state)
Chi-square test → Relationship between categorical variables (marital status & pets)
Pearson correlation → Relationship between two numerical variables (age & inactivity)

Before running parametric tests, assumptions were checked:

Normality using Shapiro-Wilk test
Homogeneity of variance using Levene’s test
Independence based on dataset structure

Challenges Faced & How They Were Solved

Choosing the Correct Statistical Tests

Initially, it was not obvious which test should be used for each question. This required understanding the difference between comparing means, testing relationships, and analyzing categorical associations.

This challenge was addressed by studying the type of variables involved (categorical vs numerical) and selecting tests accordingly.

Understanding and Checking Assumptions

Another difficulty was determining when assumption tests were necessary and how to interpret them.

Normality tests sometimes indicated non-normality even when distributions appeared reasonable.
It was important to understand that large samples can still allow parametric tests due to the Central Limit Theorem.

This step helped in learning how real statistical analysis balances theory with practical considerations.

Interpreting Non-Significant Results

Most tests produced non-significant results, which initially seemed like a lack of useful findings.

However, this highlighted an important insight:

Not finding a difference is still a valuable result.

It suggests that demographic variables alone may not explain customer behavior.

Translating Statistics into Business Insights

Converting statistical outcomes into meaningful business recommendations was one of the most challenging parts.

This required moving beyond numbers and considering what the results imply for marketing, segmentation, and decision-making.

Key Findings

Gender, education level, age, and state did not significantly influence monthly spending.
Age showed little to no relationship with customer inactivity.
A significant association was found between marital status and the number of pets owned.
Spending patterns appear relatively consistent across demographic groups.

Business Implications

The results suggest that:

Demographic segmentation alone may not be sufficient.
Behavioral data (purchase history, engagement patterns) is likely more informative.
Nationwide strategies may be effective due to minimal geographic differences.
Targeted campaigns based on lifestyle factors may outperform demographic targeting.

Skills Used and Developed

Technical Skills

Data exploration and cleaning
Exploratory data analysis (EDA)
Statistical hypothesis testing
Assumption testing for parametric methods
Data visualization
Interpreting statistical outputs

Analytical Skills

Translating business questions into statistical problems
Choosing appropriate analytical methods
Drawing evidence-based conclusions
Communicating findings clearly

Tools & Libraries

Python
Pandas
NumPy
Matplotlib / Seaborn
SciPy

Key Learnings

This project reinforced several important lessons:

Data-driven decisions should be based on evidence, not assumptions
Non-significant results can still provide valuable insights
Correct test selection is crucial for valid conclusions
Statistical analysis is only useful when translated into business context

Conclusion

This analysis demonstrates how statistical techniques can be used to evaluate real-world business assumptions about customers. While demographic variables showed limited predictive power for spending behavior, the findings emphasize the importance of behavioral data for customer segmentation and strategy development.

Overall, the project highlights the role of analytics in supporting informed, evidence-based decision-making.

kunwargupta/US-Customer-Insights-Analysis