Implemented baselines
Naive Predictors
Implements the naive predictor models.
The naive predictor models are simple models that predict the mean of the response values. The NaivePredictor predicts the overall mean of the response, the NaiveCellLineMeanPredictor predicts the mean of the response per cell line, and the NaiveDrugMeanPredictor predicts the mean of the response per drug. The NaiveTissueMeanPredictor predicts the mean of the response per tissue. The NaiveTissueDrugMeanPredictor predicts the mean of the response per tissue-drug combination. The NaiveMeanEffectsPredictor predicts the response as the overall mean plus the cell line effect plus the drug effect and should be the strongest naive baseline.
- class drevalpy.models.baselines.naive_pred.NaiveCellLineMeanPredictor
Bases:
NaiveModelNaive predictor model that predicts the mean of the response per cell line.
- cell_line_views = ['cell_line_name']
- drug_views = ['pubchem_id']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
NaiveCellLineMeanPredictor
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features, in this case the cell line ids.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line ids
- load_drug_features(data_path, dataset_name)
Loads the drug features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug IDs.
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the cell line mean for each drug-cell line combination.
If the cell line is not in the training set, the dataset mean is used.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – not neededcell_line_input (
FeatureDataset) – not neededdrug_input (
FeatureDataset|None) – not needed
- Return type:
- Returns:
array of the same length as the input cell_line_id containing the cell line mean
- predict_cl(cl_id)
Predicts the mean of the response for a given cell line.
If the cell line is not in the training set, the dataset mean is used. :type cl_id:
str:param cl_id: Cell line ID :rtype:float:return: predicted response
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')
Computes the mean per cell line.
If - later on - the cell line is not in the training set, the overall mean is used. :type output:
DrugResponseDataset:param output: training dataset containing the response output :type cell_line_input:FeatureDataset:param cell_line_input: cell line inputs :type drug_input:FeatureDataset|None:param drug_input: not needed :type output_earlystopping:DrugResponseDataset|None:param output_earlystopping: not needed :type model_checkpoint_dir:str:param model_checkpoint_dir: not needed- Return type:
- Parameters:
output (DrugResponseDataset)
cell_line_input (FeatureDataset)
drug_input (FeatureDataset | None)
output_earlystopping (DrugResponseDataset | None)
model_checkpoint_dir (str)
- class drevalpy.models.baselines.naive_pred.NaiveDrugMeanPredictor
Bases:
NaiveModelNaive predictor model that predicts the mean of the response per drug.
- cell_line_views = ['cell_line_name']
- drug_views = ['pubchem_id']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
NaiveDrugMeanPredictor
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line IDs.
- load_drug_features(data_path, dataset_name)
Loads the drug features, in this case the drug ids.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug ids
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the drug mean for each drug-cell line combination.
If the drug is not in the training set, the dataset mean is used.
- Parameters:
cell_line_ids (
ndarray) – not neededdrug_ids (
ndarray) – drug idscell_line_input (
FeatureDataset) – not neededdrug_input (
FeatureDataset|None) – not needed
- Return type:
- Returns:
array of the same length as the input drug_id containing the drug mean
- predict_drug(drug_id)
Predicts the mean of the response for a given drug.
If the drug is not in the training set, the dataset mean is used.
- Parameters:
drug_id (
str) – ID of the drug- Returns:
predicted response
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')
Computes the mean per drug. If - later on - the drug is not in the training set, the overall mean is used.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – not neededdrug_input (
FeatureDataset|None) – drug idoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed
- Raises:
ValueError – If drug_input is None
- Return type:
- class drevalpy.models.baselines.naive_pred.NaiveMeanEffectsPredictor
Bases:
NaiveModelANOVA-like predictor model.
Predicts the response as: response = overall_mean + cell_line_effect + drug_effect.
- Here:
cell_line_effect = (cell line mean - overall_mean)
drug_effect = (drug mean - overall_mean)
This formulation ensures that the overall mean is not counted twice.
- cell_line_views = ['cell_line_name']
- drug_views = ['pubchem_id']
- classmethod get_model_name()
Returns the name of the model.
- Return type:
- Returns:
The name of the model as a string.
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line IDs.
- load_drug_features(data_path, dataset_name)
Loads the drug features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug IDs.
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts responses for given cell line and drug pairs.
- The prediction is computed as:
prediction = overall_mean + cell_line_effect + drug_effect
If a cell line or drug has not been seen during training, their effect is set to zero.
- Parameters:
cell_line_ids (
ndarray) – Array of cell line IDs.drug_ids (
ndarray) – Array of drug IDs.cell_line_input (
FeatureDataset) – Not used.drug_input (
FeatureDataset|None) – Not used.
- Return type:
- Returns:
NumPy array of predicted responses.
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains with overall mean, cell line effects, and drug effects.
- Parameters:
output (
DrugResponseDataset) – Training dataset containing the response output.cell_line_input (
FeatureDataset) – Feature dataset containing cell line IDs.drug_input (
FeatureDataset|None) – Feature dataset containing drug IDs. Must not be None.output_earlystopping (
DrugResponseDataset|None) – Not used.model_checkpoint_dir (
str) – Not used.
- Raises:
ValueError – If drug_input is None.
- Return type:
- class drevalpy.models.baselines.naive_pred.NaiveModel
Bases:
DRPModelBase class for all naive predictor models which are based on simple dataset stats.
This class provides a shared interface and save/load mechanism for simple statistical models that predict drug response based on dataset means, stratified by drug, cell line, or tissue.
- build_model(hyperparameters)
Builds the model.
Naive model do not require any hyperparameter tuning.
- Parameters:
hyperparameters (
dict) – Dictionary of hyperparameters (not used).
- classmethod load(directory)
Loads the model parameters from the given directory.
Reads the ‘naive_model.json’ file and initializes a NaiveModel instance with the loaded parameters. :type directory:
str:param directory: Path to the directory where the model is saved. :rtype:NaiveModel:return: An instance of NaiveModel with the loaded parameters.- Parameters:
directory (str)
- Return type:
- save(directory)
Saves the model parameters to the given directory.
Serializes dataset_mean and any available subclass-specific attributes to a JSON file named ‘naive_model.json’. Creates the directory if it doesn’t exist.
- class drevalpy.models.baselines.naive_pred.NaivePredictor
Bases:
NaiveModelNaive predictor model that predicts the overall mean of the response.
- cell_line_views = ['cell_line_name']
- drug_views = ['pubchem_id']
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features, in this case the cell line ids.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line ids
- load_drug_features(data_path, dataset_name)
Loads the drug features, in this case the drug ids.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug ids
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the dataset mean for each drug-cell line combination.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – not neededcell_line_input (
FeatureDataset) – not neededdrug_input (
FeatureDataset|None) – not needed
- Return type:
- Returns:
array of the same length as the input cell line id containing the dataset mean
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Computes the overall mean of the output response values and saves them.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – not neededdrug_input (
FeatureDataset|None) – not neededoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed
- Return type:
- class drevalpy.models.baselines.naive_pred.NaiveTissueDrugMeanPredictor
Bases:
NaiveModelNaive predictor model that predicts the mean of the response per tissue-drug combination.
This model combines tissue and drug information to predict the mean response aggregated across all cell lines from the same tissue tested on the same drug. If a (tissue, drug) combination was not seen during training, it falls back to the overall dataset mean.
- cell_line_views = ['tissue']
- drug_views = ['pubchem_id']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
NaiveTissueDrugMeanPredictor
- classmethod load(directory)
Loads the model parameters from the given directory.
Overrides the base class load method to convert string keys back to tuple keys.
- Parameters:
directory (
str) – Path to the directory where the model is saved.- Return type:
- Returns:
An instance of NaiveTissueDrugMeanPredictor with the loaded parameters.
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features, in this case the tissue annotations.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the tissue ids
- load_drug_features(data_path, dataset_name)
Loads the drug features, in this case the drug ids.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug ids
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the tissue-drug mean for each drug-cell line combination.
If the (tissue, drug) combination is not in the training set, the dataset mean is used.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – drug ids (used directly, following NaiveDrugMeanPredictor pattern)cell_line_input (
FeatureDataset) – tissue featuresdrug_input (
FeatureDataset|None) – not needed
- Return type:
- Returns:
array of the same length as the input containing the tissue-drug mean or dataset mean
- save(directory)
Saves the model parameters to the given directory.
Overrides the base class save method to handle tuple keys in tissue_drug_means by converting them to JSON-serializable string keys.
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')
Computes the mean per tissue-drug combination. Falls back to the overall mean for unknown combinations.
- Parameters:
output (
DrugResponseDataset) – training dataset with .response and .drug_idscell_line_input (
FeatureDataset) – tissue features for cell linesdrug_input (
FeatureDataset|None) – drug id featuresoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed
- Raises:
ValueError – If drug_input is None.
- Return type:
- class drevalpy.models.baselines.naive_pred.NaiveTissueMeanPredictor
Bases:
NaiveModelNaive predictor model that predicts the mean of the response per tissue.
- cell_line_views = ['tissue']
- drug_views = []
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
NaiveTissueMeanPredictor
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features, in this case the tissue annotations.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the tissue ids
- load_drug_features(data_path, dataset_name)
Loads the drug features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug IDs.
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the tissue mean for each drug-cell line combination.
If the tissue is not in the training set, the dataset mean is used.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – not neededcell_line_input (
FeatureDataset) – tissue featuresdrug_input (
FeatureDataset|None) – not needed
- Return type:
- Returns:
array of the same length as the input cell_line_id containing the tissue mean
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='None')
Computes the mean per tissue. Falls back to the overall mean for unknown tissues.
- Parameters:
output (
DrugResponseDataset) – training dataset with .responsecell_line_input (
FeatureDataset) – tissue features for cell linesdrug_input (
FeatureDataset|None) – not neededoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed
- Return type:
Sklearn Models
Contains sklearn baseline models: ElasticNet, RandomForest, SVM.
- class drevalpy.models.baselines.sklearn_models.ElasticNetModel
Bases:
SklearnModelElasticNet model for drug response prediction.
- class drevalpy.models.baselines.sklearn_models.GradientBoosting
Bases:
SklearnModelGradient Boosting model for drug response prediction.
- class drevalpy.models.baselines.sklearn_models.ProteomicsElasticNetModel
Bases:
ElasticNetModelElasticNet model for drug response prediction using proteomics data.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters (
dict) – Hyperparameters for the model.
- cell_line_views = ['proteomics']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
ProteomicsElasticNet
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line proteomics features, filtered through the landmark genes
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the response for the given input.
- Parameters:
drug_ids (
ndarray) – drug idscell_line_ids (
ndarray) – cell line idsdrug_input (
FeatureDataset|None) – drug inputcell_line_input (
FeatureDataset) – cell line input
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – If drug_input is None.
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model.
The number of features is the number of genes + the number of fingerprints. :type output:
DrugResponseDataset:param output: training dataset containing the response output :type cell_line_input:FeatureDataset:param cell_line_input: training dataset containing gene expression data :type drug_input:FeatureDataset|None:param drug_input: training dataset containing fingerprints data :type output_earlystopping:DrugResponseDataset|None:param output_earlystopping: not needed :type model_checkpoint_dir:str:param model_checkpoint_dir: not needed :raises ValueError: If drug_input is None.- Return type:
- Parameters:
output (DrugResponseDataset)
cell_line_input (FeatureDataset)
drug_input (FeatureDataset | None)
output_earlystopping (DrugResponseDataset | None)
model_checkpoint_dir (str)
- class drevalpy.models.baselines.sklearn_models.ProteomicsRandomForest
Bases:
RandomForestRandomForest model for drug response prediction using proteomics data.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters (
dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.
- cell_line_views = ['proteomics']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
ProteomicsRandomForest
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line proteomics features, filtered through the landmark genes
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the response for the given input.
- Parameters:
drug_ids (
ndarray) – drug idscell_line_ids (
ndarray) – cell line idsdrug_input (
FeatureDataset|None) – drug inputcell_line_input (
FeatureDataset) – cell line input
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – If drug_input is None.
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model.
The number of features is the number of genes + the number of fingerprints. :type output:
DrugResponseDataset:param output: training dataset containing the response output :type cell_line_input:FeatureDataset:param cell_line_input: training dataset containing gene expression data :type drug_input:FeatureDataset|None:param drug_input: training dataset containing fingerprints data :type output_earlystopping:DrugResponseDataset|None:param output_earlystopping: not needed :type model_checkpoint_dir:str:param model_checkpoint_dir: not needed :raises ValueError: If drug_input is None.- Return type:
- Parameters:
output (DrugResponseDataset)
cell_line_input (FeatureDataset)
drug_input (FeatureDataset | None)
output_earlystopping (DrugResponseDataset | None)
model_checkpoint_dir (str)
- class drevalpy.models.baselines.sklearn_models.RandomForest
Bases:
SklearnModelRandomForest model for drug response prediction.
- class drevalpy.models.baselines.sklearn_models.SVMRegressor
Bases:
SklearnModelSVM model for drug response prediction.
- class drevalpy.models.baselines.sklearn_models.SklearnModel
Bases:
DRPModelParent class that contains the common methods for the sklearn models.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters (
dict) – Custom hyperparameters for the model, have to be defined in the child class.
- cell_line_views = ['gene_expression']
- drug_views = ['fingerprints']
- classmethod get_model_name()
Returns the model name.
- Raises:
NotImplementedError – If the method is not implemented in the child class.
- Return type:
- classmethod load(directory)
Load a trained sklearn-based model and its preprocessing components from disk.
Loads: - model.pkl: the trained sklearn model - hyperparameters.json: model hyperparameters (optional) - scaler.pkl: gene expression scaler (optional) - proteomics_transformer.pkl: proteomics transformer (optional)
- Parameters:
directory (
str) – path to the directory where model files are stored- Return type:
- Returns:
an instance of the model with restored state
- Raises:
FileNotFoundError – if model.pkl is missing
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line gene expression features, filtered through the landmark genes
- load_drug_features(data_path, dataset_name)
Load the drug features, in this case the fingerprints.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug fingerprints
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the response for the given input.
- Parameters:
drug_ids (
ndarray) – drug idscell_line_ids (
ndarray) – cell line idsdrug_input (
FeatureDataset|None) – drug inputcell_line_input (
FeatureDataset) – cell line input
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – If drug_input is None.
- save(directory)
Save the trained model and any associated preprocessing components to the given directory.
Saves: - model.pkl: the trained sklearn model - hyperparameters.json: dictionary of model hyperparameters (if present) - scaler.pkl: fitted gene expression scaler (if present) - proteomics_transformer.pkl: fitted proteomics transformer (if present)
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model.
The number of features is the number of genes + the number of fingerprints. :type output:
DrugResponseDataset:param output: training dataset containing the response output :type cell_line_input:FeatureDataset:param cell_line_input: training dataset containing gene expression data :type drug_input:FeatureDataset|None:param drug_input: training dataset containing fingerprints data :type output_earlystopping:DrugResponseDataset|None:param output_earlystopping: not needed :type model_checkpoint_dir:str:param model_checkpoint_dir: not needed :raises ValueError: If drug_input is None.- Return type:
- Parameters:
output (DrugResponseDataset)
cell_line_input (FeatureDataset)
drug_input (FeatureDataset | None)
output_earlystopping (DrugResponseDataset | None)
model_checkpoint_dir (str)
Single-Drug Elastic Net
SingleDrugElasticNet and SingleDrugProteomicsElasticNet classes. Fit an Elastic net for each drug separately.
- class drevalpy.models.baselines.singledrug_elastic_net.SingleDrugElasticNet
Bases:
SklearnModelSingleDrugElasticNet class.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters – Elastic net hyperparameters
- cell_line_views = ['gene_expression']
- drug_views = []
- early_stopping = False
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
SingleDrugElasticNet
- is_single_drug_model = True
- load_drug_features(data_path, dataset_name)
Load drug features. Not needed for SingleDrugElasticNet.
- Parameters:
data_path – path to the data
dataset_name – name of the dataset
- Returns:
None
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the drug response for the given cell lines.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – drug ids, not needed herecell_line_input (
FeatureDataset) – cell line inputdrug_input (
FeatureDataset|None) – drug input, not needed here
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – if drug_input is not None
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model; the number of features is the number of fingerprints.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – training dataset containing gene expression datadrug_input (
FeatureDataset|None) – not neededoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed as checkpoints are not saved
- Return type:
- class drevalpy.models.baselines.singledrug_elastic_net.SingleDrugProteomicsElasticNet
Bases:
SingleDrugElasticNetSingleDrugProteomicsElasticNet class.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters (
dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.
- cell_line_views = ['proteomics']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
SingleDrugProteomicsElasticNet
- classmethod load(directory)
Load a trained SingleDrugProteomicsElasticNet model and transformer.
Loads: - model.pkl: trained ElasticNet model - transformer.pkl: fitted ProteomicsMedianCenterAndImputeTransformer
- Parameters:
directory (
str) – Directory where the model files are stored- Return type:
- Returns:
Loaded instance of SingleDrugProteomicsElasticNet
- load_cell_line_features(data_path, dataset_name)
Loads the proteomics data.
- Parameters:
- Return type:
- Returns:
proteomics data
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the drug response for the given cell lines.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – drug ids, not needed herecell_line_input (
FeatureDataset) – cell line inputdrug_input (
FeatureDataset|None) – drug input, not needed here
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – if drug_input is not None
- save(directory)
Save the trained model and proteomics transformer.
Saves: - model.pkl: the fitted ElasticNet model - transformer.pkl: the fitted ProteomicsMedianCenterAndImputeTransformer
- Parameters:
directory (
str) – Target directory for saving model files- Raises:
ValueError – If model or transformer is not initialized
- Return type:
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model; the number of features is the number of fingerprints.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – training dataset containing gene expression datadrug_input (
FeatureDataset|None) – not neededoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed as checkpoints are not saved
- Return type:
Single-Drug Random Forest
Contains the SingleDrugRandomForest class.
It is a RandomForest model that uses only gene expression dataset for drug response prediction and trains one model per drug.
- class drevalpy.models.baselines.singledrug_random_forest.SingleDrugProteomicsRandomForest
Bases:
SingleDrugRandomForestSingleDrugProteomicsRandomForest class.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters (
dict) – Hyperparameters for the model. Contains n_estimators, criterion, max_samples, and n_jobs.
- cell_line_views = ['proteomics']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
SingleDrugProteomicsRandomForest
- load_cell_line_features(data_path, dataset_name)
Loads the proteomics features.
- Parameters:
- Return type:
- Returns:
proteomics data
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the drug response for the given cell lines.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – drug ids, not needed herecell_line_input (
FeatureDataset) – cell line inputdrug_input (
FeatureDataset|None) – drug input, not needed here
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – if drug_input is not None
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model; the number of features is the number of fingerprints.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – training dataset containing gene expression datadrug_input (
FeatureDataset|None) – not neededoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed as checkpoints are not saved
- Return type:
- class drevalpy.models.baselines.singledrug_random_forest.SingleDrugRandomForest
Bases:
RandomForestSingleDrugRandomForest class.
- drug_views = []
- early_stopping = False
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
SingleDrugRandomForest
- is_single_drug_model = True
- load_drug_features(data_path, dataset_name)
Load drug features. Not needed for SingleDrugRandomForest.
- Parameters:
data_path – path to the data
dataset_name – name of the dataset
- Returns:
None
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the drug response for the given cell lines.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – drug ids, not needed herecell_line_input (
FeatureDataset) – cell line inputdrug_input (
FeatureDataset|None) – drug input, not needed here
- Return type:
- Returns:
predicted drug response
- Raises:
ValueError – if drug_input is not None
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model; the number of features is the number of fingerprints.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – training dataset containing gene expression datadrug_input (
FeatureDataset|None) – not neededoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed as checkpoints are not saved
- Raises:
ValueError – if drug_input is not None
- Return type:
Multi-OMICS Random Forest
Contains the Multi-OMICS Random Forest model.
- class drevalpy.models.baselines.multi_omics_random_forest.MultiOmicsRandomForest
Bases:
RandomForestMulti-OMICS Random Forest model.
- build_model(hyperparameters)
Builds the model from hyperparameters.
- Parameters:
hyperparameters (
dict) – Hyperparameters for the model.
- cell_line_views = ['gene_expression', 'methylation', 'mutations', 'copy_number_variation_gistic']
- classmethod get_model_name()
Returns the model name.
- Return type:
- Returns:
MultiOmicsRandomForest
- classmethod load(directory)
Loads the trained model, hyperparameters, scaler, and PCA transformer from the specified directory.
- Parameters:
directory (
str) – Path to the directory where model components are stored.- Return type:
- Returns:
An instance of MultiOmicsRandomForest with restored state.
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line omics features, filtered through the drug target genes
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the response for the given input.
- Parameters:
cell_line_ids (
ndarray) – cell line idsdrug_ids (
ndarray) – drug idscell_line_input (
FeatureDataset) – cell line inputdrug_input (
FeatureDataset|None) – drug input
- Return type:
- Returns:
predicted response
- Raises:
RuntimeError – if PCA has not been fit
- save(directory)
Saves the trained model, hyperparameters, scaler, and PCA transformer to the specified directory.
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model: the number of features is the number of genes + the number of fingerprints.
- Parameters:
output (
DrugResponseDataset) – training dataset containing the response outputcell_line_input (
FeatureDataset) – training dataset containing the OMICsdrug_input (
FeatureDataset|None) – training dataset containing fingerprints dataoutput_earlystopping (
DrugResponseDataset|None) – not neededmodel_checkpoint_dir (
str) – not needed
- Return type: