SU
sunnynguyen-ai/house-price-prediction
End-to-end ML project predicting house prices using Python, scikit-learn, and Flask
House Price Prediction
An end-to-end machine learning project that predicts house prices using Python, scikit-learn, and Flask.
Project Overview
This project demonstrates a complete ML pipeline from data preprocessing to model deployment. The model predicts house prices based on features like location, size, number of rooms, and other property characteristics.
Features
- Data Processing: Clean and prepare housing data for modeling
- Exploratory Data Analysis: Comprehensive analysis with visualizations
- Feature Engineering: Create meaningful features for better predictions
- Model Training: Multiple algorithms with hyperparameter tuning
- Model Evaluation: Performance metrics and validation
- Web Interface: Flask app for real-time predictions
- Deployment Ready: Containerized with Docker
Tech Stack
- Python 3.8+
- Machine Learning: scikit-learn, pandas, numpy
- Visualization: matplotlib, seaborn
- Web Framework: Flask
- Development: Jupyter Notebook
- Deployment: Docker
Project Structure
house-price-prediction/
├── data/
│ ├── raw/ # Original datasets
│ └── processed/ # Cleaned and processed data
├── notebooks/
│ └── 01_data_exploration.ipynb # EDA and data analysis
├── src/
│ ├── data_loader.py # Data loading functions
│ └── model_training.py # Model training pipeline
├── models/ # Saved trained models
├── app.py # Flask web application
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Installation
- Clone the repository:
git clone https://github.com/sunnynguyen-ai/house-price-prediction.git
cd house-price-prediction- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtUsage
Training the Model
python src/model_training.pyRunning the Web App
python app.pyVisit http://localhost:5000 to use the prediction interface.
Jupyter Notebooks
Start Jupyter and explore the analysis:
jupyter notebookModel Performance
- Algorithm: Random Forest Regressor
- Dataset: California Housing Dataset (sklearn)
- Training: Achieves ~75-85% R² score on test data
- Features: 8 input features (income, location, house age, etc.)
- Evaluation: Includes MAE, RMSE, and feature importance analysis
Dataset
This project uses housing data with features including:
- Square footage
- Number of bedrooms/bathrooms
- Location (zip code)
- Age of property
- Property type
- Local amenities
Key Insights
- Property size has the strongest correlation with price
- Location significantly impacts pricing (30-40% variance)
- Newer properties command premium pricing
- Feature engineering improved model accuracy by 12%
Future Improvements
- Add more advanced algorithms (XGBoost, Neural Networks)
- Implement time series analysis for price trends
- Add real estate market indicators
- Enhance web interface with interactive maps
- Deploy to cloud platform (AWS/Heroku)
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit changes (
git commit -am 'Add new feature') - Push to branch (
git push origin feature/improvement) - Create Pull Request
License
This project is open source and available under the MIT License.
Contact
Sunny Nguyen
- GitHub: @sunnynguyen-ai
- Email: sunny.nguyen@onimail.com
- Website: sunnyinspires.com
On this page
Languages
Python97.6%Jupyter Notebook1.7%Makefile0.6%
Contributors
Created September 20, 2025
Updated March 6, 2026