💡 About The Project¶
💡 About The Project¶
Longitudinal datasets contain information about the same cohort of individuals (instances) over time,
with the same set of features (variables) repeatedly measured across different time points
(also called waves
) [1,2].
Scikit-longitudinal
(Sklong) is a machine learning library designed to analyse
longitudinal data, also called Panel data in certain fields. Today, Sklong is focussed on the Longitudinal Machine Learning Classification task.
It offers tools and models for processing, analysing,
and classify longitudinal data, with a user-friendly interface that
integrates with the Scikit-learn
ecosystem.
For further information, visit the official documentation.
🛠️ Installation¶
To install Sklong
, follow these two easy steps:
-
✅ Install the latest version of
Sklong
:Different Versions?
You can also install different versions of the library by specifying the version number, e.g.,
pip install Scikit-longitudinal==0.0.1
. Refer to the Release Notes. -
📦 [MANDATORY] Update the required dependencies
Why is this necessary?
See this explanation.
Scikit-longitudinal
includes a modified version ofScikit-Learn
calledScikit-Lexicographical-Trees
, which can be found at this Pypi link.This revised version ensures compatibility with the unique features of
Scikit-longitudinal
. However, conflicts may occur with other dependencies that also requireScikit-Learn
. Follow these steps to prevent any issues when running your project.🫵 Simple Setup: Command Line Installation
If you want to try
Sklong
in a simple environment without a properpyproject.toml
file (such as usingPoetry
,PDM
, etc.), run the following command:🫵 Project Setup: Using
PDM
(or any other package manager such asPoetry
, etc.)If you have a project managed by
PDM
, or any other package manager, the example below demonstratesPDM
. The process is similar forPoetry
and others. Consult their documentation for instructions on excluding a package.To prevent dependency conflicts, you can exclude
Scikit-Learn
by adding the following configuration to yourpyproject.toml
file:This exclusion ensures
Scikit-Lexicographical-Trees
(used asScikit-Learn
) is used seamlessly within your project.
💻 Developer Notes¶
For developers looking to contribute, please refer to the Contributing
section of the documentation.
🛠️ Supported Operating Systems¶
Scikit-longitudinal
is compatible with the following operating systems:
- MacOS
- Linux 🐧
- Windows via Docker only (Docker uses Linux containers) 🪟
Warning
We haven't tested it on Windows without Docker.
🚀 Getting Started¶
To perform longitudinal machine learning classification using Sklong
, start by employing the
LongitudinalDataset
class to prepare your dataset (i.e, data itself, temporal vector, etc.). To analyse your data,
you can utilise for instance the LexicoGradientBoostingClassifier
or any other available estimator/preprocessor.
"The
LexicoGradientBoostingClassifier
in a nutshell: is a variant of Gradient Boosting specifically designed for longitudinal data, using a lexicographical approach that prioritises recentwaves
over older ones in certain scenarios [1].
Next, you can apply the popular fit, predict, prodict_proba, or transform
methods depending on what you previously employed in the same way that Scikit-learn
does, as shown in the example below:
from scikit_longitudinal.data_preparation import LongitudinalDataset
from scikit_longitudinal.estimators.ensemble.lexicographical.lexico_gradient_boosting import LexicoGradientBoostingClassifier
dataset = LongitudinalDataset('./stroke.csv')
dataset.load_data_target_train_test_split(
target_column="class_stroke_wave_4",
)
# Pre-set or manually set your temporal dependencies
dataset.setup_features_group(input_data="Elsa")
model = LexicoGradientBoostingClassifier(
features_group=dataset.feature_groups(),
threshold_gain=0.00015 # Refer to the API for more hyper-parameters and their meaning
)
model.fit(dataset.X_train, dataset.y_train)
y_pred = model.predict(dataset.X_test)
# Classification report
print(classification_report(y_test, y_pred))
Neural Networks models
Please see the documentation's FAQ
tab for a list of similar projects that may offer
Neural Network-based models, as this project presently does not.
If we are interested in building Neural Network-based models for longitudinal data,
we will announce it in due course.
Wants to understand what's the feature_groups? How your temporal dependencies are set via pre-set or manually?
To understand how to set your temporal dependencies, please refer to the Temporal Dependency
tab of the documentation.
Wants more to grasp the idea?
To see more examples, please refer to the Examples
tab of the documentation.
Wants more control on hyper-parameters?
To see the full API reference, please refer to the API
tab.
📚 References¶
[1] Kelloway, E.K. and Francis, L., 2012. Longitudinal research and data analysis. In Research methods in occupational health psychology (pp. 374-394). Routledge.
[2] Ribeiro, C. and Freitas, A.A., 2019. A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. In 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019 (num. of pages: 5).
[3] Ribeiro, C. and Freitas, A.A., 2024. A lexicographic optimisation approach to promote more recent features on longitudinal decision-tree-based classifiers: applications to the English Longitudinal Study of Ageing. Artificial Intelligence Review, 57(4), p.84