AMLProject2

Attributes:

Age
Sex
Chest Pain Type
Resting Blood Pressure
Cholesteral
Fasting Blood Sugar
Resting Electrocardiographic Results
Maximum Heart Rate
Exercise Induced Angina
ST Depression Induced By Exercise Relative To Rest
Slope of the peak exercise ST segment
Number of major vessels
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
Diagnosis of heart disease (predicted)

Final project – Logistic Regression, SVM and Neural Networks

For project 1, you selected a data set and investigated how kNN classifier could help with
classifying testing samples after training/learning. For project 2, you will be continuing your
exploration with other supervised ML approaches – logistic regression, support vector
machine, and Neural Nets.

This project is to be done in a group. Each student is responsible for contributing to the group,
including problem formulation, dataset selection, ML tool implementation, and project
presentation.

Requirement:

Source code in Python (done in a group). Your code must run on CS lab machines.
Individual project report (~6 pages + appendices if needed, fonts>=11)

Specific requirements:

Dataset:

Each group needs to pick a new dataset to work on.
Dataset must be interesting and challenging (if the accuracy is very high, say 99% using
a knn or very low (<50%), select a different dataset! That means either the problem can
be solved without any machine learning algorithm or beyond what we have learned in
this class.)

Your individual report that includes:

Abstract - Give a brief presentation of the problem, dataset used, summarize the
methods, and outline your results and conclusions.

Introduction - Detailed problem description and background of the dataset. Justify the
dataset is appropriate and worth to explore. Outline approaches you take to solve the
problem.
Statistical summary of your data - For each class, what are: max, min, mean, median,
mode, standard deviation. If you used only a subset of attributes, justify why other
attributes were not used. Summary what the statistics tells you, any insights you have
obtained from the statistics.
Methods - A brief description of each model, logistic regression, support vector machine
(linear kernel), and neural networks. Also include what ranges of parameters and neural
network architectures (consider at least 2 different hidden layers with different # of
neurons and 2 different gradient decent solvers) you’ll consider exploring and why?
Demonstrate you have an intuitive understanding of the ML algorithms.
Results - Summary of your classification results, including best set of parameters and
architectures, accuracy, and confusion matrices from a) logistic regression, b) SVM, and
c) neural nets.
Discussion - Describe and analyze the results. Are the results what you expected? How
do the three different models compare? Why one is better or worse than another?
Conclusion of your exploration. Did you solve the problem? How helpful are the ML
algorithms in terms of answering your questions? What have you learned?
(Graduate student) Give a more detailed description of each model, logistic regression,
support vector machine, and neural nets, and compare SVM with linear kernel and SVM
with Gaussian kernel. One page per model. So, 3 additional pages.
References

Demo:

Your Python script.
Here are some sample datasets:

Flight Delays and Cancellations: https://www.kaggle.com/usdot/flight-delays
Heart Disease Data Set: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
MIT Leukemia cancer dataset: http://portals.broadinstitute.org/cgibin/cancer/publications/pub_paper.cgi?mode=view&paper_id=43

Submission instructions:

This project has multiple due dates, tentatively:

March 9th: Dataset selection
March 30: Intro, Stats section of your individual report (words or pdf)
April 6: Methods and Results from Logistic regression & SVM (words or pdf)
April 13: Results from neural nets (words or pdf)
April 20: Discussions & Conclusions (words or pdf)
April 25: Presentation (ppt, 1 copy per group)
April 25: An electronic copy of your Python scripts (yes, need only 1 copy per group)
April 25: Final individual project report (words or pdf).

AMTuttle02/AMLProject2