EL
ElektrischesSchaf/Leukemia_prediction_with_SVM
Using scikit-learn and mlxtend
Using two kinds of feature selection methods, recursive feature elimination (RFE) and sequential feature selector (SFS) for support vector machine (SVM) to predict leukemia.
The leukemia dataset contains 7218 genes from 72 samples. These data are classified into two type of leukemia, acute
lymphocytic leukemia (ALL) and acute myelocytic leukemia (AML).
We run 50 iterations with different feature numbers, feature selection methods, and different kernels.
The feature numbers vary from 10 to 70. In each iteration, we randomize the order of the samples before selecting
38 training data and 34 testing data. That means the training data and testing data consist of different samples
everytime. Then we calculate the Matthews correlation coefficient to evaluate the result of each feature selection method.