Longitudinal Dataset¶

LongitudinalDataset¶

LongitudinalDataset(
   file_path: Union[str, Path],
   data_frame: Optional[pd.DataFrame] = None
)

The LongitudinalDataset class is a comprehensive container specifically designed for managing and preparing longitudinal datasets. It provides essential data management and transformation capabilities, thereby facilitating the development and application of machine learning algorithms tailored to longitudinal data classification tasks.

Feature Groups and Non-Longitudinal Characteristics

The class employs two crucial attributes, feature_groups and non_longitudinal_features, which play a vital role in enabling adapted/newly-designed machine learning algorithms to comprehend the temporal structure of longitudinal datasets.

features_group: A temporal matrix representing the temporal dependency of a longitudinal dataset. Each tuple/list of integers in the outer list represents the indices of a longitudinal attribute's waves, with each longitudinal attribute having its own sublist in that outer list. For more details, see the documentation's "Temporal Dependency" page.
non_longitudinal_features: A list of feature indices that are considered non-longitudinal. These features are not part of the temporal matrix and are treated as static features or not by any subsequent techniques employed.

Wrapper Around Pandas DataFrame

This class wraps a pandas DataFrame, offering a familiar interface while incorporating enhancements for longitudinal data. It ensures effective processing and learning from data collected over multiple time points.

Parameters¶

file_path (Union[str, Path]): Path to the dataset file. Supports both ARFF and CSV formats.
data_frame (Optional[pd.DataFrame], optional): If provided, this pandas DataFrame will serve as the dataset, and the file_path parameter will be ignored.

Properties¶

data (pd.DataFrame): A read-only property that returns the loaded dataset as a pandas DataFrame.
target (pd.Series): A read-only property that returns the target variable (class variable) as a pandas Series.
X_train (np.ndarray): A read-only property that returns the training data as a numpy array.
X_test (np.ndarray): A read-only property that returns the test data as a numpy array.
y_train (pd.Series): A read-only property that returns the training target data as a pandas Series.
y_test (pd.Series): A read-only property that returns the test target data as a pandas Series.

smoke_w1	smoke_w2	cholesterol_w1	cholesterol_w2	age	gender	stroke_w2
0	1	0	1	45	1	0
1	1	1	1	50	0	1
0	0	0	0	55	1	0
1	1	1	1	60	0	1
0	1	0	1	65	1	0

Longitudinal Dataset¶