mickeyhqian/VoteEnsemble
VoteEnsemble: Ensemble methods for machine/deep learning and stochastic programming with guaranteed generalization.
VoteEnsemble
This repository contains the Python implementation of the VE (VoteEnsemble) family of ensemble learning methods:
Among the three methods,
Installation
1. cd to the root directory, i.e., VoteEnsemble, of this repository.
2. Install the required dependencies, setuptools, numpy and zstandard, if not already intalled. You can install them directly with
pip install setuptools numpy zstandard
or
pip install -r requirements.txt
3. Install VoteEnsemble via
pip install .
Quick Start
To use VoteEnsemble, you need to define a base learning algorithm for you problem by subclassing the BaseLearner class defined in VoteEnsemble.py. Below are two simple use cases to illustrate this.
Linear regression
Consider a linear regression
where exampleLR.py implements such an example, where the base learning algorithm is least squares, and applies
python exampleLR.py
which shall produce the result
True model parameters = [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
ROVE outputs the parameters: [3.92431628e-03 1.01457683e+00 1.96402875e+00 3.01047031e+00
4.00479241e+00 5.00741279e+00 5.99596621e+00 7.01602010e+00
7.99180409e+00 8.99286178e+00]
ROVEs outputs the parameters: [-7.11339923e-03 1.00764019e+00 1.97278415e+00 3.00220791e+00
3.99707439e+00 5.02509414e+00 5.98887793e+00 7.02417495e+00
8.01337643e+00 8.96901555e+00]
Stochastic linear program
Consider a simple linear program with random coefficients
where exampleLP.py implements such an example, where the base learning algorithm is the sample average approximation, and applies
python exampleLP.py
which shall produce the result
True optimal objective value = 0.0
MoVE outputs the solution: [1. 0.], objective value = 0.0
ROVE outputs the solution: [1. 0.], objective value = 0.0
ROVEs outputs the solution: [1. 0.], objective value = 0.0
Advanced Usage
Parallel Ensemble Construction and Evaluation
The VE methods involve constructing and evaluating ensembles on many random subsamples of the full dataset, which can be easily parallelized. This implementation supports parallelization through multiprocessing. By default, parallelization is disabled, but you can enable it when creating instances of each method as follows:
# Parallelize ensemble construction in MoVE with 8 processes
move = MoVE(yourBaseLearner, numParallelLearn=8)
# Parallelize ensemble construction and evaluation in ROVE with 8 and 6 processes respectively
rove = ROVE(yourBaseLearner, False, numParallelLearn=8, numParallelEval=6)
# Parallelize ensemble construction and evaluation in ROVEs with 8 and 6 processes respectively
roves = ROVE(yourBaseLearner, True, numParallelLearn=8, numParallelEval=6)
Offloading Ensembles to Disk
If your machine learning model or optimization solution is memory-intensive, it may not be feasible to store the entire ensemble in RAM. This implementation provides a feature to offload all learned models/solutions to disk, and load a model/solution to memory only when the methods need access to it. By default, this feature is disabled, but you can enable it as follows:
# Offload the ensemble in MoVE to the specified directory
move = MoVE(yourBaseLearner, subsampleResultsDir="path/to/your/directory")
# Offload the ensemble in ROVE to the specified directory
rove = ROVE(yourBaseLearner, False, subsampleResultsDir="path/to/your/directory")
# Offload the ensemble in ROVEs to the specified directory and retain it after execution
roves = ROVE(yourBaseLearner, True, subsampleResultsDir="path/to/your/directory", deleteSubsampleResults=False)

