💡 About The Project¶
💡 About The Project¶
Longitudinal datasets contain information about the same cohort of individuals (instances) over time,
with the same set of features (variables) repeatedly measured across different time points
(also called waves
) [1,2].
Scikit-Longitudinal
, also called Sklong
is a machine learning library designed to analyse
longitudinal data, also called Panel data in certain fields. Today, Sklong
is focussed on the Longitudinal Machine Learning Classification task.
It offers tools and models for processing, analysing,
and classify longitudinal data, with a user-friendly interface that
integrates with the Scikit-learn
ecosystem.
Auto-Scikit-Longitudinal
(Auto-Sklong) is an Automated Machine Learning (AutoML) library, developed upon the
General Machine Learning Assistant (GAMA)
framework, introduces a brand-new search space
leveraging both
Sklong
and Scikit-learn
models to tackle the Longitudinal machine learning classification tasks.
Auto-Sklong
comes with various search method to explore the search space
introduced. Bayesian Optimisation
via SMAC3
, Random Search
, Successive Halving
, and Evolutionary Algorithms
, via GAMA
.
🛠️ Installation¶
- ✅ Install the latest version of
Auto-Sklong
:
Different Versions?
You can also install different versions of the library by specifying the version number, e.g., pip install Auto-Sklong==0.0.1
.
Refer to the Release Notes.
- 📦 [MANDATORY] Update the required dependencies (Why? See here)
Auto-Sklong
incorporates via Sklong
a modified version of Scikit-Learn
called Scikit-Lexicographical-Trees
,
which can be found at this Pypi link.
This revised version guarantees compatibility with the unique features of Scikit-longitudinal
.
Nevertheless, conflicts may occur with other dependencies in Auto-Sklong
that also require Scikit-Learn
.
Follow these steps to prevent any issues when running your project.
🫵 Simple Setup: Command Line Installation
Say you want to try `Auto-Sklong` in a very simple environment. Such as without a proper `project.toml` file (`Poetry`, `PDM`, etc). Run the following command:🫵 Project Setup: Using `PDM` (or any other such as `Poetry`, etc.)
Imagine you have a project being managed by `PDM`, or any other package manager. The example below demonstrates `PDM`. Nevertheless, the process is similar for `Poetry` and others. Consult their documentation for instructions on excluding a package. Therefore, to prevent dependency conflicts, you can exclude `Scikit-Learn` by adding the provided configuration to your `pyproject.toml` file. *This exclusion ensures Scikit-Lexicographical-Trees (used as `Scikit-learn`) is used seamlessly within your project.*💻 Developer Notes¶
For developers looking to contribute, please refer to the Contributing
section of GAMA
here
and Scikit-Longitudinal
here.
🛠️ Supported Operating Systems¶
Auto-Sklong
is compatible with the following operating systems:
- MacOS
- Linux 🐧
- On Windows 🪟, you are recommended to run the library within a Docker container under a Linux distribution.
Warning
We haven't tested it on Windows without Docker.
🚀 Getting Started¶
To perform an AutoML search for your longitudinal machine learning classification task using Auto-Sklong
, start by employing the
LongitudinalDataset
class to prepare your dataset (i.e, data itself, temporal vector, etc.). Next, instantiate
a GamaLongitudinalClassifier
object, which will set up the necessary configuration to run a search on your data,
with the parameters you would have entered in the GamaLongitudinalClassifier
constructor.
from sklearn.metrics import classification_report
from scikit_longitudinal.data_preparation import LongitudinalDataset
from gama.GamaLongitudinalClassifier import GamaLongitudinalClassifier
# Load your longitudinal dataset
dataset = LongitudinalDataset('./stroke.csv')
dataset.load_data_target_train_test_split(
target_column="class_stroke_wave_4",
)
# Pre-set or manually set your temporal dependencies
dataset.setup_features_group(input_data="elsa") # (1)
# Instantiate the AutoML system
automl = GamaLongitudinalClassifier(
features_group=dataset.features_group(),
non_longitudinal_features=dataset.non_longitudinal_features(), # (2)
feature_list_names=dataset.data.columns,
# (3)
)
# Run the AutoML system to find the best model and hyperparameters
model.fit(dataset.X_train, dataset.y_train)
# Predictions and prediction probabilities
label_predictions = automl.predict(X_test)
probability_predictions = automl.predict_proba(X_test)
print(classification_report(y_test, label_predictions))
automl.export_script() # (4)
- Define the features_group manually or use a pre-set from the LongitudinalDataset class. If the data was from the ELSA database, you could have used the pre-sets such as
.setup_features_group('elsa')
. Read further in here - Define the non-longitudinal features manually or use a pre-set from the LongitudinalDataset class. If the data was from the ELSA database, you could have used the pre-sets such as
.setup_features_group('elsa')
, then the non-longitudinal features would have been automatically set. Read further in here. - The
GamaLongtudinalClassifier
comes with a variety of hyperparameters that can be set. Refer to the API for more information. - The
export_script
method allows you to export the best model found by the AutoML system as a Python script. This script can be used to reproduce the model without the need for the AutoML system. Refer to the API for more information.
Wants to understand what's the feature_groups
? How your temporal dependencies are set via pre-set
or manually
?
To understand how to set your temporal dependencies, please refer to the Temporal Dependency
tab of the documentation.
Wants more control on hyper-parameters
?
To see the full API reference, please refer to the API
tab.
Wants more information on the Search Space Auto-Sklong
comes with?
To see the full Search Space, please refer to the Search Space
tab.
Wants more to grasp the idea?
To see more examples, please refer to the Examples
tab of the documentation.
📚 References¶
[1] Kelloway, E.K. and Francis, L., 2012. Longitudinal research and data analysis. In Research methods in occupational health psychology (pp. 374-394). Routledge.
[2] Ribeiro, C. and Freitas, A.A., 2019. A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. In 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019 (num. of pages: 5).