DIPK
DIPK Model
DIPK model. Adapted from https://github.com/user15632/DIPK.
Original publication: Improving drug response prediction via integrating gene relationships with deep learning Pengyong Li, Zhengxiang Jiang, Tianxiao Liu, Xinyu Liu, Hui Qiao, Xiaojun Yao Briefings in Bioinformatics, Volume 25, Issue 3, May 2024, bbae153, https://doi.org/10.1093/bib/bbae153
- class drevalpy.models.DIPK.dipk.DIPKModel
Bases:
DRPModelDIPK model. Adapted from https://github.com/user15632/DIPK.
- build_model(hyperparameters)
Builds the DIPK model with the specified hyperparameters.
- Parameters:
hyperparameters (
dict[str,Any]) – embedding_dim, heads, fc_layer_num, fc_layer_dim, dropout_rate, epochs, batch_size, lr- Return type:
Details of hyperparameters:
embedding_dim: int, embedding dimension used for the graph encoder which is not used in the final model
heads: int, number of heads for the multi-head attention layer, defaults to 1
fc_layer_num: int, number of fully connected layers for the dense layers
fc_layer_dim: list[int], number of neurons for each fully connected layer
dropout_rate: float, dropout rate for all fully connected layers
epochs: int, number of epochs to train the model
batch_size: int, batch size for training
lr: float, learning rate for training
- cell_line_views = ['gene_expression', 'bionic_features']
- drug_views = ['molgnet_features']
- early_stopping = True
- classmethod load(directory)
Load the DIPK model and gene expression encoder using PyTorch conventions.
This method expects the following files in the given directory:
“dipk_model.pt”: PyTorch state_dict of the DIPK predictor model
“gene_encoder.pt”: PyTorch state_dict of the gene expression encoder
“hyperparameters.json”: Dictionary of hyperparameters, must include “gene_encoder_input_dim”
- load_cell_line_features(data_path, dataset_name)
Load cell line features.
- Parameters:
- Return type:
- Returns:
cell line features
- load_drug_features(data_path, dataset_name)
Load drug features.
- Parameters:
- Return type:
- Returns:
drug features
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the response values for the given cell lines and drugs.
- Parameters:
cell_line_ids (
ndarray) – list of cell line IDsdrug_ids (
ndarray) – list of drug IDscell_line_input (
FeatureDataset) – input data associated with the cell linedrug_input (
FeatureDataset|None) – input data associated with the drug
- Return type:
- Returns:
predicted response values
- Raises:
ValueError – if drug_input is None or if the model is not initialized or if the gene expression encoder is not initialized
- save(directory)
Save the DIPK model and gene expression encoder using PyTorch conventions.
This method stores:
“dipk_model.pt”: PyTorch state_dict of the DIPK predictor model
“gene_encoder.pt”: PyTorch state_dict of the trained gene expression encoder
“hyperparameters.json”: All hyperparameters including encoder input_dim
- Parameters:
directory (
str) – Target directory where the model files will be saved- Raises:
ValueError – If model or encoder is not built
- Return type:
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Trains the model.
- Parameters:
output (
DrugResponseDataset) – training data associated with the response outputcell_line_input (
FeatureDataset) – input data associated with the cell linedrug_input (
FeatureDataset|None) – input data associated with the drugoutput_earlystopping (
DrugResponseDataset|None) – early stopping data associated with the response outputmodel_checkpoint_dir (
str) – directory to save the model checkpoint
- Raises:
ValueError – if drug_input is None or if the model is not initialized
- Return type:
Attention utils
Contains a custom MultiHeadAttentionLayer for the DIPK model.
- class drevalpy.models.DIPK.attention_utils.MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
Bases:
ModuleCustom multi-head attention layer for the DIPK model.
Data utils
Includes functions to load and process the DIPK dataset.
get_data: Creates a list of dictionaries with drug and cell line features.
CollateFn: Class to collate the DataLoader batches.
DIPKDataset: Dataset class for the DIPK model.
- class drevalpy.models.DIPK.data_utils.CollateFn(train=True)
Bases:
objectCollate function for the DataLoader, either for training or testing.
- class drevalpy.models.DIPK.data_utils.DIPKDataset(samples)
Bases:
Dataset,ABCDataset of graphs from get_data.
- drevalpy.models.DIPK.data_utils.get_data(cell_ids, drug_ids, cell_line_features, drug_features, ic50=None)
Prepare data samples for training or prediction.
Each sample includes:
Drug features (e.g., molecular embeddings).
Cell line features (gene expression and bionic_features).
Optional IC50 response values for supervised tasks.
- Parameters:
cell_ids (
ndarray) – IDs of the cell lines from the dataset.drug_ids (
ndarray) – IDs of the drugs from the dataset.cell_line_features (
FeatureDataset) – Input features associated with the cell lines.drug_features (
FeatureDataset) – Input features associated with the drugs.ic50 (
ndarray|None) – (Optional) Response values (e.g., IC50) to associate with samples.
- Return type:
- Returns:
List of dictionaries, each containing drug and cell line features, with optional IC50.
- drevalpy.models.DIPK.data_utils.load_bionic_features(data_path, dataset_name, gene_add_num=512)
Load biological network (BIONIC) features for DIPK.
- Parameters:
- Return type:
- Returns:
FeatureDataset with gene expression and biological network features
Gene expression encoder
Gene expression Autoencoder for DIPK model.
- class drevalpy.models.DIPK.gene_expression_encoder.CollateFn
Bases:
objectCollate function for the DataLoader, either for training or testing.
- class drevalpy.models.DIPK.gene_expression_encoder.DataSet(data)
Bases:
Dataset,ABCDataset class for gene expression data.
- class drevalpy.models.DIPK.gene_expression_encoder.GeneExpressionDecoder(input_dim, latent_dim=512, h_dims=None, drop_out_rate=0.3)
Bases:
ModuleGene expression decoder.
- forward(embedding)
Forward pass of the gene expression decoder.
- Parameters:
embedding – input data
- Returns:
decoded data
- class drevalpy.models.DIPK.gene_expression_encoder.GeneExpressionEncoder(input_dim, latent_dim=512, h_dims=None, drop_out_rate=0.3)
Bases:
ModuleGene expression encoder.
Code adapted from the DIPK model https://github.com/user15632/DIPK.
- forward(input)
Forward pass of the gene expression encoder.
- Parameters:
input – input data
- Returns:
encoded data
- drevalpy.models.DIPK.gene_expression_encoder.encode_gene_expression(gene_expression_input, encoder)
Encode gene expression data.
- Parameters:
gene_expression_input (
ndarray) – gene expression dataencoder (
GeneExpressionEncoder) – trained encoder model
- Return type:
- Returns:
encoded gene expression data
- drevalpy.models.DIPK.gene_expression_encoder.train_gene_expession_autoencoder(gene_expression_input, gene_expression_input_early_stopping, epochs_autoencoder=100)
Train the autoencoder model for gene expression data with early stopping.
- Parameters:
- Return type:
- Returns:
trained encoder model
Model utils
Includes custom torch.nn.Modules for the DIPK model: AttentionLayer, DenseLayer, Predictor.
- class drevalpy.models.DIPK.model_utils.AttentionLayer(heads=1)
Bases:
ModuleCustom attention layer for the DIPK model.
- Parameters:
heads (int)
- forward(molgnet_features, mask, gene_expression, bionic)
Forward pass of the attention layer.
- Parameters:
molgnet_features (
Tensor) – MolGNet featuresmask (
Tensor) – mask for the MolGNet features, as molecules have varying sizes (valid atom features are True)gene_expression (
Tensor) – gene expression features of the graph databionic (
Tensor) – bionic network features of the graph data
- Return type:
Tensor- Returns:
tensor of MolGNet features after attention layer
- class drevalpy.models.DIPK.model_utils.DenseLayers(fc_layer_num, fc_layer_dim, dropout_rate)
Bases:
ModuleCustom dense layers for the DIPK model.
- forward(x, gene, bionic)
Forward pass of the dense layers.
- Parameters:
x (
Tensor) – output tensor from the attention layergene (
Tensor) – gene expression features (GEF) of the graph databionic (
Tensor) – biological network features (BNF) of the graph data
- Return type:
Tensor- Returns:
output tensor after the dense layers
- class drevalpy.models.DIPK.model_utils.Predictor(heads, fc_layer_num, fc_layer_dim, dropout_rate)
Bases:
ModuleWhole DIPK model.
- forward(molgnet_drug_features, gene_expression, bionic, molgnet_mask)
Forward pass of the DIPK model.
- Parameters:
molgnet_drug_features (
Tensor) – tensor of MolGNet features from graph datagene_expression (
Tensor) – gene expression features (GEF) of the graph databionic (
Tensor) – biological network features (BNF) of the graph datamolgnet_mask (
Tensor) – mask for the MolGNet features, as molecules have varying sizes
- Return type:
Tensor- Returns:
output tensor of the DIPK model