AayushPrranav/Comparative-Analysis-of-Machine-Learning-Algorithms-on-the-MathSuccess-Dataset
MathSuccess - it is a dataset of 2600 rows containing student info and if they passed a highly competitive Math Class. I used 5 models to compare its accuracy in R-Studio. Models Used : ctree, rpart, svm, randomForest, nnet
MathSuccess Dataset Analysis
Overview
This project analyzes the MathSuccess dataset, which contains data on student performance, including their scores from various entrance exams and their final course success. The goal is to compare multiple machine learning algorithms to determine which model best predicts student success.
Table of Contents
- Dataset Description
- Preprocessing Steps
- Algorithms Used
- Results
- Installation
- Usage
- Contributing
- License
Dataset Description
The MathSuccess dataset consists of approximately 2600 rows with the following columns:
- Student: Identifier for each student.
- Gender: Gender of the student.
- PSATM: Score from the PSAT exam.
- SATM: Score from the SAT exam.
- ACTM: Score from the ACT exam.
- Rank: Academic rank of the student.
- Size: Size of the school.
- GPAadj: Adjusted GPA.
- PlcmtScore: Placement score.
- Recommends: Recommendations received.
- Course: Course taken by the student.
- Grade: Grade received in the course.
- RecTaken: Recommendations taken into account.
- TooHigh/TooLow: Indicators for admission test scores.
- CourseSuccess: Outcome variable indicating whether the student passed or failed.
Preprocessing Steps
- Removed rows with missing values in the CourseSuccess column.
- Replaced missing values in other columns with zero.
- Converted categorical variables (Recommends, Grade, CourseSuccess) into factors.
Algorithms Used
The following machine learning algorithms were implemented:
- CTree (Classification and Regression Trees)
- RPart (Recursive Partitioning)
- Support Vector Machine (SVM)
- Random Forest
- Neural Network
Each algorithm was chosen based on its strengths in handling classification tasks and its ability to provide insights into student performance.
Results
The performance metrics for each algorithm are summarized below:
| Algorithm | Accuracy | Precision | Recall | F1 Score | Error Rate |
|---|---|---|---|---|---|
| C_Tree | 0.9953 | 1.0000 | 0.9931 | 0.9965 | 0.0047 |
| R_Part | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0000 |
| SVM | 0.9812 | 1.0000 | 0.9730 | 0.9863 | 0.0188 |
| Random Forest | 0.9930 | 1.0000 | 0.9897 | 0.9948 | 0.0070 |
| Neural Network | 0.9953 | 1.0000 | 0.9931 | 0.9965 | 0.0047 |
Installation
To run this project, ensure you have R and RStudio installed on your machine, along with the following libraries:
install.packages(c("rpart", "party", "e1071", "randomForest", "nnet", "caTools"))Usage
Clone this repository and run the code in RStudio to replicate the analysis:
git clone https://github.com/yourusername/MathSuccessAnalysis.git
cd MathSuccessAnalysisOpen analysis.R in RStudio and execute the code to see results.
Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to adjust any sections according to your preferences or add any additional information that may be relevant to your project!