Abhi11ch/CODSOFT-Spam-SMS-Detection-Using-ML
Spam SMS Detection using Machine Learning is a text classification project that identifies whether an SMS is spam or not. It uses NLP techniques for text preprocessing and a machine learning model to accurately detect unwanted messages.
CODSOFT-Spam-SMS-Detection-Using-ML
Spam SMS Detection using Machine Learning is a text classification project that identifies whether an SMS is spam or not. It uses NLP techniques for text preprocessing and a machine learning model to accurately detect unwanted messages.
Check out my application on SMS Spam Detection
https://sms-spam-detection-pfr0xnyizaf.streamlit.app/
SMS Spam Detection
This project aims to develop a machine-learning model to detect spam messages in SMS text data. It utilizes natural language processing (NLP) techniques and a supervised learning algorithm to classify SMS messages as either spam or non-spam (ham).
Dataset
The dataset used for this project is the "SMS Spam Collection" from the UCI Machine Learning Repository. It contains a collection of 5,574 SMS messages, labeled as spam or ham. The dataset can be downloaded from [link to dataset] (https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection).
The dataset file (sms_spam_dataset.csv) contains two columns:
label: Indicates whether the message is spam (1) or ham (0).
text: The actual text content of the SMS message.
Requirements
To run the project, you need the following dependencies:
Python 3.x
pandas
numpy
scikit-learn
nltk (Natural Language Toolkit)
matplotlib
You can install the required packages by running the following command:
pip install pandas numpy scikit-learn nltk matplotlib
Usage
Clone the repository or download the project files.
Place the sms_spam_dataset.csv file in the project directory.
Run the sms_spam_detection.py script to train and evaluate the spam detection model.
The script will load the dataset, preprocess the text data, and train a machine learning model using the TF-IDF (Term Frequency-Inverse Document Frequency) technique.
After training, the model will be evaluated on a holdout set and the performance metrics (such as accuracy, precision, recall, and F1-score) will be displayed.
Finally, you can use the trained model to predict the label (spam/ham) of new SMS messages by modifying the predict function in the script.
Results
The trained model achieved an accuracy of 97.10 % and Precision is 100 % on the test set and performed well in terms of precision, recall, and F1-score.
Metric Score
Accuracy 97.10 %
Precision 100 %
Recall 76.19 %
F1-score 86.49 %
Feel free to contribute, modify, or use the code according to the terms of the license.