Longitudinal Stacking Classifier¶
LongitudinalStackingClassifier¶
LongitudinalStackingClassifier(
estimators: List[CustomClassifierMixinEstimator],
meta_learner: Optional[Union[CustomClassifierMixinEstimator,
ClassifierMixin]] = LogisticRegression(), n_jobs: int = 1
)
The Longitudinal Stacking Classifier is a sophisticated ensemble method designed to handle the unique challenges posed by longitudinal data. By leveraging a stacking approach, this classifier combines multiple base estimators trained to feed their prediction to a meta-learner to enhance predictive performance. The base estimators are individually trained on the entire dataset, and their predictions serve as inputs for the meta-learner, which generates the final prediction.
When to Use?
This classifier is primarily used when the "SepWav" (Separate Waves) strategy is employed. However, it can also be applied with only Longitudinal-based estimators and do not follow the SepWav approach if wanted.
SepWav (Separate Waves) Strategy
The SepWav strategy involves considering each wave's features and the class variable as a separate dataset, then learning a classifier for each dataset. The class labels predicted by these classifiers are combined into a final predicted class label. This combination can be achieved using various approaches: simple majority voting, weighted voting with weights decaying linearly or exponentially for older waves, weights optimized by cross-validation on the training set (see LongitudinalVoting), and stacking methods (current class) that use the classifiers' predicted labels as input for learning a meta-classifier (using a decision tree, logistic regression, or Random Forest algorithm).
Wrapper Around Sklearn StackingClassifier
This class wraps the sklearn
StackingClassifier, offering a familiar interface while incorporating
enhancements for longitudinal data.
Parameters¶
- estimators (
List[CustomClassifierMixinEstimator]
): The base estimators for the ensemble, they need to be trained already. - meta_learner (
Optional[Union[CustomClassifierMixinEstimator, ClassifierMixin]]
): The meta-learner to be used in stacking. - n_jobs (
int
): The number of jobs to run in parallel for fitting base estimators.
Attributes¶
- clf_ensemble (
StackingClassifier
): The underlying sklearn StackingClassifier instance.
Raises¶
- ValueError: If no base estimators are provided or the meta learner is not suitable.
- NotFittedError: If attempting to predict or predict_proba before fitting the model or any of the base estimators are not fitted.
Methods¶
Fit¶
Fits the ensemble model.
Parameters¶
- X (
np.ndarray
): The input data. - y (
np.ndarray
): The target data.
Returns¶
- LongitudinalStackingClassifier: The fitted model.
Predict¶
Predicts the target data for the given input data.
Parameters¶
- X (
np.ndarray
): The input data.
Returns¶
- ndarray: The predicted target data.
Predict Proba¶
Predicts the target data probabilities for the given input data.
Parameters¶
- X (
np.ndarray
): The input data.
Returns¶
- ndarray: The predicted target data probabilities.
Examples¶
Dummy Longitudinal Dataset¶
Consider the following dataset: stroke.csv
Features:
smoke
(longitudinal) with two waves/time-pointscholesterol
(longitudinal) with two waves/time-pointsage
(non-longitudinal)gender
(non-longitudinal)
Target:
stroke
(binary classification) at wave/time-point 2 only for the sake of the example
The dataset is shown below (w
stands for wave
in ELSA):
smoke_w1 | smoke_w2 | cholesterol_w1 | cholesterol_w2 | age | gender | stroke_w2 |
---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 45 | 1 | 0 |
1 | 1 | 1 | 1 | 50 | 0 | 1 |
0 | 0 | 0 | 0 | 55 | 1 | 0 |
1 | 1 | 1 | 1 | 60 | 0 | 1 |
0 | 1 | 0 | 1 | 65 | 1 | 0 |
Example 1: Basic Usage¶
- Define the features_group manually or use a pre-set from the LongitudinalDataset class. If the data was from the ELSA database, you could have used the pre-sets such as
.setup_features_group('elsa')
. - Define the non-longitudinal features or use a pre-set from the LongitudinalDataset class. If the data was from the ELSA database, you could have used the pre-sets such as
.setup_features_group('elsa')
which therefore automatically sets the non-longitudinal features. - Define the base estimators for the ensemble. Longitudinal-based or non-longitudinal-based estimators can be used. However, what is important is that the estimators are trained prior to being passed to the LongitudinalStackingClassifier.
- Lexico Random Forest do not require the non-longitudinal features to be passed. However, if an algorithm does, then it would have been used.
- Define the meta-learner for the ensemble. The meta-learner can be any classifier from the scikit-learn library. Today, we are using the LogisticRegression classifier, DecisionTreeClassifier, or RandomForestClassifier for simplicity of their underlying algorithms.
- Fit the model with the training data and make predictions. Finally, evaluate the model using the accuracy_score metric.
Exemple 2: Use more than one CPUs¶
- Define the features_group manually or use a pre-set from the LongitudinalDataset class. If the data was from the ELSA database, you could have used the pre-sets such as
.setup_features_group('elsa')
. - Define the non-longitudinal features or use a pre-set from the LongitudinalDataset class. If the data was from the ELSA database, you could have used the pre-sets such as
.setup_features_group('elsa')
which therefore automatically sets the non-longitudinal features. - Define the base estimators for the ensemble. Longitudinal-based or non-longitudinal-based estimators can be used. However, what is important is that the estimators are trained prior to being passed to the LongitudinalStackingClassifier.
- Lexico Random Forest do not require the non-longitudinal features to be passed. However, if an algorithm does, then it would have been used.
- Define the meta-learner for the ensemble. The meta-learner can be any classifier from the scikit-learn library. Today, we are using the LogisticRegression classifier, DecisionTreeClassifier, or RandomForestClassifier for simplicity of their underlying algorithms.
- Fit the model with the training data and make predictions. Finally, evaluate the model using the accuracy_score metric.
Notes¶
For more information, please refer to the following paper:
References¶
- Ribeiro and Freitas (2019):
- Ribeiro, C. and Freitas, A.A., 2019. A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. In 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019 (num. of pages: 5).