Implemented baselines

Naive Predictors

Implements the naive predictor models.

The naive predictor models are simple models that predict the mean of the response values. The NaivePredictor predicts the overall mean of the response, the NaiveCellLineMeanPredictor predicts the mean of the response per cell line, and the NaiveDrugMeanPredictor predicts the mean of the response per drug. The NaiveTissueMeanPredictor predicts the mean of the response per tissue. The NaiveTissueDrugMeanPredictor predicts the mean of the response per tissue-drug combination. The NaiveMeanEffectsPredictor predicts the response as the overall mean plus the cell line effect plus the drug effect and should be the strongest naive baseline.

class drevalpy.models.baselines.naive_pred.NaiveCellLineMeanPredictor

Bases: NaiveModel

Naive predictor model that predicts the mean of the response per cell line.

cell_line_views = ['cell_line_name']
drug_views = ['pubchem_id']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

NaiveCellLineMeanPredictor

load_cell_line_features(data_path, dataset_name)

Loads the cell line features, in this case the cell line ids.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line ids

load_drug_features(data_path, dataset_name)

Loads the drug features.

Parameters:
  • data_path (str) – Path to the data.

  • dataset_name (str) – Name of the dataset.

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug IDs.

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the cell line mean for each drug-cell line combination.

If the cell line is not in the training set, the dataset mean is used.

Parameters:
Return type:

ndarray

Returns:

array of the same length as the input cell_line_id containing the cell line mean

predict_cl(cl_id)

Predicts the mean of the response for a given cell line.

If the cell line is not in the training set, the dataset mean is used. :type cl_id: str :param cl_id: Cell line ID :rtype: float :return: predicted response

Parameters:

cl_id (str)

Return type:

float

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')

Computes the mean per cell line.

If - later on - the cell line is not in the training set, the overall mean is used. :type output: DrugResponseDataset :param output: training dataset containing the response output :type cell_line_input: FeatureDataset :param cell_line_input: cell line inputs :type drug_input: FeatureDataset | None :param drug_input: not needed :type output_earlystopping: DrugResponseDataset | None :param output_earlystopping: not needed :type model_checkpoint_dir: str :param model_checkpoint_dir: not needed

Return type:

None

Parameters:
class drevalpy.models.baselines.naive_pred.NaiveDrugMeanPredictor

Bases: NaiveModel

Naive predictor model that predicts the mean of the response per drug.

cell_line_views = ['cell_line_name']
drug_views = ['pubchem_id']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

NaiveDrugMeanPredictor

load_cell_line_features(data_path, dataset_name)

Loads the cell line features.

Parameters:
  • data_path (str) – Path to the data.

  • dataset_name (str) – Name of the dataset.

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line IDs.

load_drug_features(data_path, dataset_name)

Loads the drug features, in this case the drug ids.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug ids

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug mean for each drug-cell line combination.

If the drug is not in the training set, the dataset mean is used.

Parameters:
Return type:

ndarray

Returns:

array of the same length as the input drug_id containing the drug mean

predict_drug(drug_id)

Predicts the mean of the response for a given drug.

If the drug is not in the training set, the dataset mean is used.

Parameters:

drug_id (str) – ID of the drug

Returns:

predicted response

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')

Computes the mean per drug. If - later on - the drug is not in the training set, the overall mean is used.

Parameters:
Raises:

ValueError – If drug_input is None

Return type:

None

class drevalpy.models.baselines.naive_pred.NaiveMeanEffectsPredictor

Bases: NaiveModel

ANOVA-like predictor model.

Predicts the response as: response = overall_mean + cell_line_effect + drug_effect.

Here:
  • cell_line_effect = (cell line mean - overall_mean)

  • drug_effect = (drug mean - overall_mean)

This formulation ensures that the overall mean is not counted twice.

cell_line_views = ['cell_line_name']
drug_views = ['pubchem_id']
classmethod get_model_name()

Returns the name of the model.

Return type:

str

Returns:

The name of the model as a string.

load_cell_line_features(data_path, dataset_name)

Loads the cell line features.

Parameters:
  • data_path (str) – Path to the data.

  • dataset_name (str) – Name of the dataset.

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line IDs.

load_drug_features(data_path, dataset_name)

Loads the drug features.

Parameters:
  • data_path (str) – Path to the data.

  • dataset_name (str) – Name of the dataset.

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug IDs.

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts responses for given cell line and drug pairs.

The prediction is computed as:

prediction = overall_mean + cell_line_effect + drug_effect

If a cell line or drug has not been seen during training, their effect is set to zero.

Parameters:
Return type:

ndarray

Returns:

NumPy array of predicted responses.

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains with overall mean, cell line effects, and drug effects.

Parameters:
Raises:

ValueError – If drug_input is None.

Return type:

None

class drevalpy.models.baselines.naive_pred.NaiveModel

Bases: DRPModel

Base class for all naive predictor models which are based on simple dataset stats.

This class provides a shared interface and save/load mechanism for simple statistical models that predict drug response based on dataset means, stratified by drug, cell line, or tissue.

build_model(hyperparameters)

Builds the model.

Naive model do not require any hyperparameter tuning.

Parameters:

hyperparameters (dict) – Dictionary of hyperparameters (not used).

classmethod load(directory)

Loads the model parameters from the given directory.

Reads the ‘naive_model.json’ file and initializes a NaiveModel instance with the loaded parameters. :type directory: str :param directory: Path to the directory where the model is saved. :rtype: NaiveModel :return: An instance of NaiveModel with the loaded parameters.

Parameters:

directory (str)

Return type:

NaiveModel

save(directory)

Saves the model parameters to the given directory.

Serializes dataset_mean and any available subclass-specific attributes to a JSON file named ‘naive_model.json’. Creates the directory if it doesn’t exist.

Parameters:

directory (str) – Path to the directory where the model will be saved.

Return type:

None

class drevalpy.models.baselines.naive_pred.NaivePredictor

Bases: NaiveModel

Naive predictor model that predicts the overall mean of the response.

cell_line_views = ['cell_line_name']
drug_views = ['pubchem_id']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

NaivePredictor

load_cell_line_features(data_path, dataset_name)

Loads the cell line features, in this case the cell line ids.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line ids

load_drug_features(data_path, dataset_name)

Loads the drug features, in this case the drug ids.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug ids

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the dataset mean for each drug-cell line combination.

Parameters:
Return type:

ndarray

Returns:

array of the same length as the input cell line id containing the dataset mean

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Computes the overall mean of the output response values and saves them.

Parameters:
Return type:

None

class drevalpy.models.baselines.naive_pred.NaiveTissueDrugMeanPredictor

Bases: NaiveModel

Naive predictor model that predicts the mean of the response per tissue-drug combination.

This model combines tissue and drug information to predict the mean response aggregated across all cell lines from the same tissue tested on the same drug. If a (tissue, drug) combination was not seen during training, it falls back to the overall dataset mean.

cell_line_views = ['tissue']
drug_views = ['pubchem_id']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

NaiveTissueDrugMeanPredictor

classmethod load(directory)

Loads the model parameters from the given directory.

Overrides the base class load method to convert string keys back to tuple keys.

Parameters:

directory (str) – Path to the directory where the model is saved.

Return type:

NaiveTissueDrugMeanPredictor

Returns:

An instance of NaiveTissueDrugMeanPredictor with the loaded parameters.

load_cell_line_features(data_path, dataset_name)

Loads the cell line features, in this case the tissue annotations.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the tissue ids

load_drug_features(data_path, dataset_name)

Loads the drug features, in this case the drug ids.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug ids

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the tissue-drug mean for each drug-cell line combination.

If the (tissue, drug) combination is not in the training set, the dataset mean is used.

Parameters:
  • cell_line_ids (ndarray) – cell line ids

  • drug_ids (ndarray) – drug ids (used directly, following NaiveDrugMeanPredictor pattern)

  • cell_line_input (FeatureDataset) – tissue features

  • drug_input (FeatureDataset | None) – not needed

Return type:

ndarray

Returns:

array of the same length as the input containing the tissue-drug mean or dataset mean

save(directory)

Saves the model parameters to the given directory.

Overrides the base class save method to handle tuple keys in tissue_drug_means by converting them to JSON-serializable string keys.

Parameters:

directory (str) – Path to the directory where the model will be saved.

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')

Computes the mean per tissue-drug combination. Falls back to the overall mean for unknown combinations.

Parameters:
Raises:

ValueError – If drug_input is None.

Return type:

None

class drevalpy.models.baselines.naive_pred.NaiveTissueMeanPredictor

Bases: NaiveModel

Naive predictor model that predicts the mean of the response per tissue.

cell_line_views = ['tissue']
drug_views = []
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

NaiveTissueMeanPredictor

load_cell_line_features(data_path, dataset_name)

Loads the cell line features, in this case the tissue annotations.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the tissue ids

load_drug_features(data_path, dataset_name)

Loads the drug features.

Parameters:
  • data_path (str) – Path to the data.

  • dataset_name (str) – Name of the dataset.

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug IDs.

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the tissue mean for each drug-cell line combination.

If the tissue is not in the training set, the dataset mean is used.

Parameters:
Return type:

ndarray

Returns:

array of the same length as the input cell_line_id containing the tissue mean

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')

Computes the mean per tissue. Falls back to the overall mean for unknown tissues.

Parameters:
Return type:

None

Sklearn Models

Contains sklearn baseline models: ElasticNet, RandomForest, SVM.

class drevalpy.models.baselines.sklearn_models.ElasticNetModel

Bases: SklearnModel

ElasticNet model for drug response prediction.

build_model(hyperparameters)

Builds the ElasticNet model from hyperparameters.

Parameters:

hyperparameters (dict) – Contains L1 ratio and alpha.

classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

ElasticNet

class drevalpy.models.baselines.sklearn_models.GradientBoosting

Bases: SklearnModel

Gradient Boosting model for drug response prediction.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model. Contains n_estimators, learning_rate, max_depth, and subsample

classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

GradientBoosting

class drevalpy.models.baselines.sklearn_models.ProteomicsElasticNetModel

Bases: ElasticNetModel

ElasticNet model for drug response prediction using proteomics data.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model.

cell_line_views = ['proteomics']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

ProteomicsElasticNet

load_cell_line_features(data_path, dataset_name)

Loads the cell line features.

Parameters:
  • data_path (str) – Path to the gene expression and landmark genes

  • dataset_name (str) – Name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line proteomics features, filtered through the landmark genes

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response for the given input.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – If drug_input is None.

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model.

The number of features is the number of genes + the number of fingerprints. :type output: DrugResponseDataset :param output: training dataset containing the response output :type cell_line_input: FeatureDataset :param cell_line_input: training dataset containing gene expression data :type drug_input: FeatureDataset | None :param drug_input: training dataset containing fingerprints data :type output_earlystopping: DrugResponseDataset | None :param output_earlystopping: not needed :type model_checkpoint_dir: str :param model_checkpoint_dir: not needed :raises ValueError: If drug_input is None.

Return type:

None

Parameters:
class drevalpy.models.baselines.sklearn_models.ProteomicsRandomForest

Bases: RandomForest

RandomForest model for drug response prediction using proteomics data.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.

cell_line_views = ['proteomics']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

ProteomicsRandomForest

load_cell_line_features(data_path, dataset_name)

Loads the cell line features.

Parameters:
  • data_path (str) – Path to the gene expression and landmark genes

  • dataset_name (str) – Name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line proteomics features, filtered through the landmark genes

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response for the given input.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – If drug_input is None.

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model.

The number of features is the number of genes + the number of fingerprints. :type output: DrugResponseDataset :param output: training dataset containing the response output :type cell_line_input: FeatureDataset :param cell_line_input: training dataset containing gene expression data :type drug_input: FeatureDataset | None :param drug_input: training dataset containing fingerprints data :type output_earlystopping: DrugResponseDataset | None :param output_earlystopping: not needed :type model_checkpoint_dir: str :param model_checkpoint_dir: not needed :raises ValueError: If drug_input is None.

Return type:

None

Parameters:
class drevalpy.models.baselines.sklearn_models.RandomForest

Bases: SklearnModel

RandomForest model for drug response prediction.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.

classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

RandomForest

class drevalpy.models.baselines.sklearn_models.SVMRegressor

Bases: SklearnModel

SVM model for drug response prediction.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model. Contains kernel, C, epsilon, and max_iter.

classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

SVR (Support Vector Regressor)

class drevalpy.models.baselines.sklearn_models.SklearnModel

Bases: DRPModel

Parent class that contains the common methods for the sklearn models.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Custom hyperparameters for the model, have to be defined in the child class.

cell_line_views = ['gene_expression']
drug_views = ['fingerprints']
classmethod get_model_name()

Returns the model name.

Raises:

NotImplementedError – If the method is not implemented in the child class.

Return type:

str

classmethod load(directory)

Load a trained sklearn-based model and its preprocessing components from disk.

Loads: - model.pkl: the trained sklearn model - hyperparameters.json: model hyperparameters (optional) - scaler.pkl: gene expression scaler (optional) - proteomics_transformer.pkl: proteomics transformer (optional)

Parameters:

directory (str) – path to the directory where model files are stored

Return type:

SklearnModel

Returns:

an instance of the model with restored state

Raises:

FileNotFoundError – if model.pkl is missing

load_cell_line_features(data_path, dataset_name)

Loads the cell line features.

Parameters:
  • data_path (str) – Path to the gene expression and landmark genes

  • dataset_name (str) – Name of the dataset

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line gene expression features, filtered through the landmark genes

load_drug_features(data_path, dataset_name)

Load the drug features, in this case the fingerprints.

Parameters:
  • data_path (str) – Path to the data

  • dataset_name (str) – Name of the dataset

Return type:

FeatureDataset | None

Returns:

FeatureDataset containing the drug fingerprints

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response for the given input.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – If drug_input is None.

save(directory)

Save the trained model and any associated preprocessing components to the given directory.

Saves: - model.pkl: the trained sklearn model - hyperparameters.json: dictionary of model hyperparameters (if present) - scaler.pkl: fitted gene expression scaler (if present) - proteomics_transformer.pkl: fitted proteomics transformer (if present)

Parameters:

directory (str) – path to the directory where model files will be stored

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model.

The number of features is the number of genes + the number of fingerprints. :type output: DrugResponseDataset :param output: training dataset containing the response output :type cell_line_input: FeatureDataset :param cell_line_input: training dataset containing gene expression data :type drug_input: FeatureDataset | None :param drug_input: training dataset containing fingerprints data :type output_earlystopping: DrugResponseDataset | None :param output_earlystopping: not needed :type model_checkpoint_dir: str :param model_checkpoint_dir: not needed :raises ValueError: If drug_input is None.

Return type:

None

Parameters:

Single-Drug Elastic Net

SingleDrugElasticNet and SingleDrugProteomicsElasticNet classes. Fit an Elastic net for each drug separately.

class drevalpy.models.baselines.singledrug_elastic_net.SingleDrugElasticNet

Bases: SklearnModel

SingleDrugElasticNet class.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters – Elastic net hyperparameters

cell_line_views = ['gene_expression']
drug_views = []
early_stopping = False
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

SingleDrugElasticNet

is_single_drug_model = True
load_drug_features(data_path, dataset_name)

Load drug features. Not needed for SingleDrugElasticNet.

Parameters:
  • data_path – path to the data

  • dataset_name – name of the dataset

Returns:

None

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug response for the given cell lines.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – if drug_input is not None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model; the number of features is the number of fingerprints.

Parameters:
Return type:

None

class drevalpy.models.baselines.singledrug_elastic_net.SingleDrugProteomicsElasticNet

Bases: SingleDrugElasticNet

SingleDrugProteomicsElasticNet class.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.

cell_line_views = ['proteomics']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

SingleDrugProteomicsElasticNet

classmethod load(directory)

Load a trained SingleDrugProteomicsElasticNet model and transformer.

Loads: - model.pkl: trained ElasticNet model - transformer.pkl: fitted ProteomicsMedianCenterAndImputeTransformer

Parameters:

directory (str) – Directory where the model files are stored

Return type:

SingleDrugProteomicsElasticNet

Returns:

Loaded instance of SingleDrugProteomicsElasticNet

load_cell_line_features(data_path, dataset_name)

Loads the proteomics data.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

proteomics data

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug response for the given cell lines.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – if drug_input is not None

save(directory)

Save the trained model and proteomics transformer.

Saves: - model.pkl: the fitted ElasticNet model - transformer.pkl: the fitted ProteomicsMedianCenterAndImputeTransformer

Parameters:

directory (str) – Target directory for saving model files

Raises:

ValueError – If model or transformer is not initialized

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model; the number of features is the number of fingerprints.

Parameters:
Return type:

None

Single-Drug Random Forest

Contains the SingleDrugRandomForest class.

It is a RandomForest model that uses only gene expression dataset for drug response prediction and trains one model per drug.

class drevalpy.models.baselines.singledrug_random_forest.SingleDrugProteomicsRandomForest

Bases: SingleDrugRandomForest

SingleDrugProteomicsRandomForest class.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.

cell_line_views = ['proteomics']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

SingleDrugProteomicsRandomForest

load_cell_line_features(data_path, dataset_name)

Loads the proteomics features.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

proteomics data

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug response for the given cell lines.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – if drug_input is not None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model; the number of features is the number of fingerprints.

Parameters:
Return type:

None

class drevalpy.models.baselines.singledrug_random_forest.SingleDrugRandomForest

Bases: RandomForest

SingleDrugRandomForest class.

drug_views = []
early_stopping = False
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

SingleDrugRandomForest

is_single_drug_model = True
load_drug_features(data_path, dataset_name)

Load drug features. Not needed for SingleDrugRandomForest.

Parameters:
  • data_path – path to the data

  • dataset_name – name of the dataset

Returns:

None

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug response for the given cell lines.

Parameters:
Return type:

ndarray

Returns:

predicted drug response

Raises:

ValueError – if drug_input is not None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model; the number of features is the number of fingerprints.

Parameters:
Raises:

ValueError – if drug_input is not None

Return type:

None

Multi-OMICS Random Forest

Contains the Multi-OMICS Random Forest model.

class drevalpy.models.baselines.multi_omics_random_forest.MultiOmicsRandomForest

Bases: RandomForest

Multi-OMICS Random Forest model.

build_model(hyperparameters)

Builds the model from hyperparameters.

Parameters:

hyperparameters (dict) – Hyperparameters for the model.

cell_line_views = ['gene_expression', 'methylation', 'mutations', 'copy_number_variation_gistic']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

MultiOmicsRandomForest

classmethod load(directory)

Loads the trained model, hyperparameters, scaler, and PCA transformer from the specified directory.

Parameters:

directory (str) – Path to the directory where model components are stored.

Return type:

MultiOmicsRandomForest

Returns:

An instance of MultiOmicsRandomForest with restored state.

load_cell_line_features(data_path, dataset_name)

Loads the cell line features.

Parameters:
  • data_path (str) – data path e.g. data/

  • dataset_name (str) – dataset name e.g. GDSC1

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line omics features, filtered through the drug target genes

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response for the given input.

Parameters:
Return type:

ndarray

Returns:

predicted response

Raises:

RuntimeError – if PCA has not been fit

save(directory)

Saves the trained model, hyperparameters, scaler, and PCA transformer to the specified directory.

Parameters:

directory (str) – Path to the directory where model components will be saved.

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model: the number of features is the number of genes + the number of fingerprints.

Parameters:
Return type:

None