SuperFELTR

SuperFELTR Model

Contains the SuperFELTR model.

Regression extension of Super.FELT: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data. Very similar to MOLI. Differences:

In MOLI, encoders and the classifier were trained jointly. Super.FELT trains them independently

MOLI was trained without feature selection (except for the Variance Threshold on the gene expression).
Super.FELT uses feature selection for all omics data.

The input remains the same: somatic mutation, copy number variation and gene expression data. Original authors of SuperFELT: Park, Soh & Lee. (2021, 10.1186/s12859-021-04146-z) Code adapted from their Github: https://github.com/DMCB-GIST/Super.FELT and Hauptmann et al. (2023, 10.1186/s12859-023-05166-7) https://github.com/kramerlab/Multi-Omics_analysis

class drevalpy.models.SuperFELTR.superfeltr.SuperFELTR

Bases: DRPModel

Regression extension of Super.FELT.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:: hyperparameters – dictionary containing the hyperparameters for the model. Contain mini_batch, dropout_rate, weight_decay, out_dim_expr_encoder, out_dim_mutation_encoder, out_dim_cnv_encoder, epochs, variance thresholds for gene expression, mutation, and copy number variation, margin, and learning rate.
Return type:: None

cell_line_views = ['gene_expression', 'mutations', 'copy_number_variation_gistic']

drug_views = []

early_stopping = True

classmethod get_model_name()

Returns the model name.

Return type:: str
Returns:: SuperFELTR

is_single_drug_model = True

load_cell_line_features(data_path, dataset_name)

Loads the cell line features: gene expression, mutations, and copy number variation.

Parameters:

data_path (str) – path to the data, e.g., data/
dataset_name (str) – name of the dataset, e.g., GDSC2

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line gene expression features, mutations, and copy number variation

load_drug_features(data_path, dataset_name)

Returns None, as drug features are not needed for SuperFELTR.

Parameters:

data_path (str) – Path to the fingerprints, e.g., data/
dataset_name (str) – Name of the dataset

Return type:

FeatureDataset | None

Returns:

None

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug response.

If there is no training data, NA is predicted. If there was not enough training data, predictions are made with the randomly initialized model.

Parameters:

cell_line_ids (ndarray) – cell line ids
drug_ids (ndarray) – drug ids
cell_line_input (FeatureDataset) – cell line omics features
drug_input (FeatureDataset | None) – drug omics features, not needed

Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – if drug_input is not None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='superfeltr_checkpoints')

Does feature selection, trains the encoders sequentially, and then trains the regressor.

If there is not enough training data, the model is trained with random initialization, if there is no training data at all, the model is skipped and later on, NA is predicted.

Parameters:

output (DrugResponseDataset) – training data associated with the response output
cell_line_input (FeatureDataset) – cell line omics features
drug_input (FeatureDataset | None) – not needed, as it is a single drug model
output_earlystopping (DrugResponseDataset | None) – optional early stopping dataset
model_checkpoint_dir (str) – not needed

Raises:

ValueError – if drug_input is not None

Return type:

None

Model utils

Utility functions for the SuperFELTR model.

class drevalpy.models.SuperFELTR.utils.SuperFELTEncoder(input_size, hpams, omic_type, ranges)

Bases: LightningModule

SuperFELT encoder definition for a single omic type, i.e., gene expression, mutation, or copy number variation.

Very similar to MOLIEncoder, but with BatchNorm1d before ReLU.

Parameters:

input_size (int)
hpams (dict[str, int | float | dict])
omic_type (str)
ranges (tuple[float, float])

configure_optimizers()

Override the configure_optimizers method to use the Adam optimizer.

Return type:: Optimizer
Returns:: Adam optimizer

forward(x)

Forward pass of the SuperFELTEncoder.

Parameters:: x (Tensor) – input tensor
Return type:: Tensor
Returns:: encoded tensor

training_step(batch, batch_idx)

Override the training_step method to compute the triplet loss.

Parameters:

batch (list[Tensor]) – batch containing the omic data and response
batch_idx (int) – index of the batch

Return type:

Tensor

Returns:

triplet loss

validation_step(batch, batch_idx)

Override the validation_step method to compute the triplet loss.

Parameters:

batch (list[Tensor]) – batch containing the omic data and response
batch_idx (int) – index of the batch

Return type:

Tensor

Returns:

triplet loss

class drevalpy.models.SuperFELTR.utils.SuperFELTRegressor(input_size, hpams, encoders)

Bases: RegressionMetricsMixin, LightningModule

SuperFELT regressor definition.

Very similar to SuperFELT classifier, but with a regression loss and without the last sigmoid layer.

Parameters:

input_size (int)
hpams (dict[str, int | float | dict])
encoders (tuple[SuperFELTEncoder, SuperFELTEncoder, SuperFELTEncoder])

configure_optimizers()

Override the configure_optimizers method to use the Adagrad optimizer.

Return type:: Optimizer
Returns:: Adagrad optimizer

forward(x)

Forward pass of the SuperFELTRegressor.

Parameters:: x (Tensor) – input tensor
Return type:: Tensor
Returns:: predicted response

predict(data_expr, data_mut, data_cnv)

Predicts the response for the given input.

Parameters:

data_expr (ndarray) – expression data
data_mut (ndarray) – mutation data
data_cnv (ndarray) – copy number variation data

Return type:

ndarray

Returns:

predicted response

training_step(batch, batch_idx)

Override the training_step method to compute the regression loss.

Parameters:

batch (list[Tensor]) – batch containing the omic data and response
batch_idx (int) – index of the batch

Return type:

Tensor

Returns:

regression loss

validation_step(batch, batch_idx)

Override the validation_step method to compute the regression loss.

Parameters:

batch (list[Tensor]) – batch containing the omic data and response
batch_idx (int) – index of the batch

Return type:

Tensor

Returns:

regression loss

drevalpy.models.SuperFELTR.utils.train_superfeltr_model(model, hpams, output_train, cell_line_input, output_earlystopping=None, patience=5, model_checkpoint_dir='superfeltr_checkpoints', wandb_project=None)

Trains one encoder or the regressor.

First, the dataset and loaders are created. Then, the model is trained with the Lightning trainer.

Parameters:

model (SuperFELTEncoder | SuperFELTRegressor) – either one of the encoders or the regressor
hpams (dict[str, int | float | dict]) – hyperparameters for the model
output_train (DrugResponseDataset) – response data for training
cell_line_input (FeatureDataset) – cell line omics features
output_earlystopping (DrugResponseDataset | None) – response data for early stopping
patience (int) – for early stopping, defaults to 5
model_checkpoint_dir (str) – directory to save the model checkpoints
wandb_project (str | None) – optional wandb project name for logging. If provided, uses WandbLogger for PyTorch Lightning training.

Return type:

ModelCheckpoint

Returns:

checkpoint callback with the best model

Raises:

ValueError – if the epochs and mini_batch are not integers