DIPK

DIPK Model

DIPK model. Adapted from https://github.com/user15632/DIPK.

Original publication: Improving drug response prediction via integrating gene relationships with deep learning Pengyong Li, Zhengxiang Jiang, Tianxiao Liu, Xinyu Liu, Hui Qiao, Xiaojun Yao Briefings in Bioinformatics, Volume 25, Issue 3, May 2024, bbae153, https://doi.org/10.1093/bib/bbae153

class drevalpy.models.DIPK.dipk.DIPKModel

Bases: DRPModel

DIPK model. Adapted from https://github.com/user15632/DIPK.

build_model(hyperparameters)

Builds the DIPK model with the specified hyperparameters.

Parameters:: hyperparameters (dict[str, Any]) – embedding_dim, heads, fc_layer_num, fc_layer_dim, dropout_rate, epochs, batch_size, lr
Return type:: None

Details of hyperparameters:

embedding_dim: int, embedding dimension used for the graph encoder which is not used in the final model
heads: int, number of heads for the multi-head attention layer, defaults to 1
fc_layer_num: int, number of fully connected layers for the dense layers
fc_layer_dim: list[int], number of neurons for each fully connected layer
dropout_rate: float, dropout rate for all fully connected layers
epochs: int, number of epochs to train the model
batch_size: int, batch size for training
lr: float, learning rate for training

cell_line_views = ['gene_expression', 'bionic_features']

drug_views = ['molgnet_features']

early_stopping = True

classmethod get_model_name()

Get the model name.

Return type:: str
Returns:: DIPK

classmethod load(directory)

Load the DIPK model and gene expression encoder using PyTorch conventions.

This method expects the following files in the given directory:

“dipk_model.pt”: PyTorch state_dict of the DIPK predictor model
“gene_encoder.pt”: PyTorch state_dict of the gene expression encoder
“hyperparameters.json”: Dictionary of hyperparameters, must include “gene_encoder_input_dim”

Parameters:: directory (str) – Path to the directory containing the model files
Return type:: DIPKModel
Returns:: An instance of DIPK with loaded model and encoder

load_cell_line_features(data_path, dataset_name)

Load cell line features.

Parameters:

data_path (str) – path to the data
dataset_name (str) – path to the dataset

Return type:

FeatureDataset

Returns:

cell line features

load_drug_features(data_path, dataset_name)

Load drug features.

Parameters:

data_path (str) – path to the data
dataset_name (str) – path to the dataset

Return type:

FeatureDataset

Returns:

drug features

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response values for the given cell lines and drugs.

Parameters:

cell_line_ids (ndarray) – list of cell line IDs
drug_ids (ndarray) – list of drug IDs
cell_line_input (FeatureDataset) – input data associated with the cell line
drug_input (FeatureDataset | None) – input data associated with the drug

Return type:

ndarray

Returns:

predicted response values

Raises:

ValueError – if drug_input is None or if the model is not initialized or if the gene expression encoder is not initialized

save(directory)

Save the DIPK model and gene expression encoder using PyTorch conventions.

This method stores:

“dipk_model.pt”: PyTorch state_dict of the DIPK predictor model
“gene_encoder.pt”: PyTorch state_dict of the trained gene expression encoder
“hyperparameters.json”: All hyperparameters including encoder input_dim

Parameters:: directory (str) – Target directory where the model files will be saved
Raises:: ValueError – If model or encoder is not built
Return type:: None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model.

Parameters:

output (DrugResponseDataset) – training data associated with the response output
cell_line_input (FeatureDataset) – input data associated with the cell line
drug_input (FeatureDataset | None) – input data associated with the drug
output_earlystopping (DrugResponseDataset | None) – early stopping data associated with the response output
model_checkpoint_dir (str) – directory to save the model checkpoint

Raises:

ValueError – if drug_input is None or if the model is not initialized

Return type:

None

Attention utils

Contains a custom MultiHeadAttentionLayer for the DIPK model.

class drevalpy.models.DIPK.attention_utils.MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)

Bases: Module

Custom multi-head attention layer for the DIPK model.

Parameters:

hid_dim (int)
n_heads (int)
dropout (float)
device (str | device | int | None)

forward(query, key, value, mask=None)

Forward pass of the multi-head attention layer.

Parameters:

query (Tensor) – query tensor
key (Tensor) – key tensor
value (Tensor) – value tensor
mask (Tensor | None) – mask tensor

Return type:

tuple[Tensor, Tensor]

Returns:

output tensor and attention tensor

Data utils

Includes functions to load and process the DIPK dataset.

get_data: Creates a list of dictionaries with drug and cell line features.
CollateFn: Class to collate the DataLoader batches.
DIPKDataset: Dataset class for the DIPK model.

class drevalpy.models.DIPK.data_utils.CollateFn(train=True)

Bases: object

Collate function for the DataLoader, either for training or testing.

class drevalpy.models.DIPK.data_utils.DIPKDataset(samples)

Bases: Dataset, ABC

Dataset of graphs from get_data.

drevalpy.models.DIPK.data_utils.get_data(cell_ids, drug_ids, cell_line_features, drug_features, ic50=None)

Prepare data samples for training or prediction.

Each sample includes:

Drug features (e.g., molecular embeddings).
Cell line features (gene expression and bionic_features).
Optional IC50 response values for supervised tasks.

Parameters:

cell_ids (ndarray) – IDs of the cell lines from the dataset.
drug_ids (ndarray) – IDs of the drugs from the dataset.
cell_line_features (FeatureDataset) – Input features associated with the cell lines.
drug_features (FeatureDataset) – Input features associated with the drugs.
ic50 (ndarray | None) – (Optional) Response values (e.g., IC50) to associate with samples.

Return type:

list

Returns:

List of dictionaries, each containing drug and cell line features, with optional IC50.

drevalpy.models.DIPK.data_utils.load_bionic_features(data_path, dataset_name, gene_add_num=512)

Load biological network (BIONIC) features for DIPK.

Parameters:

data_path (str) – Path to the data, e.g., “data/”
dataset_name (str) – Name of the dataset, e.g., GDSC2
gene_add_num (int) – Number of genes to add to the feature set

Return type:

FeatureDataset

Returns:

FeatureDataset with gene expression and biological network features

Gene expression encoder

Gene expression Autoencoder for DIPK model.

class drevalpy.models.DIPK.gene_expression_encoder.CollateFn

Bases: object

Collate function for the DataLoader, either for training or testing.

class drevalpy.models.DIPK.gene_expression_encoder.DataSet(data)

Bases: Dataset, ABC

Dataset class for gene expression data.

class drevalpy.models.DIPK.gene_expression_encoder.GeneExpressionDecoder(input_dim, latent_dim=512, h_dims=None, drop_out_rate=0.3)

Bases: Module

Gene expression decoder.

forward(embedding)

Forward pass of the gene expression decoder.

Parameters:: embedding – input data
Returns:: decoded data

class drevalpy.models.DIPK.gene_expression_encoder.GeneExpressionEncoder(input_dim, latent_dim=512, h_dims=None, drop_out_rate=0.3)

Bases: Module

Gene expression encoder.

Code adapted from the DIPK model https://github.com/user15632/DIPK.

forward(input)

Forward pass of the gene expression encoder.

Parameters:: input – input data
Returns:: encoded data

drevalpy.models.DIPK.gene_expression_encoder.encode_gene_expression(gene_expression_input, encoder)

Encode gene expression data.

Parameters:

gene_expression_input (ndarray) – gene expression data
encoder (GeneExpressionEncoder) – trained encoder model

Return type:

ndarray

Returns:

encoded gene expression data

drevalpy.models.DIPK.gene_expression_encoder.train_gene_expession_autoencoder(gene_expression_input, gene_expression_input_early_stopping, epochs_autoencoder=100)

Train the autoencoder model for gene expression data with early stopping.

Parameters:

gene_expression_input (ndarray) – gene expression data
gene_expression_input_early_stopping (ndarray) – validation data for early stopping
epochs_autoencoder (int) – number of epochs for training the autoencoder

Return type:

GeneExpressionEncoder

Returns:

trained encoder model

Model utils

Includes custom torch.nn.Modules for the DIPK model: AttentionLayer, DenseLayer, Predictor.

class drevalpy.models.DIPK.model_utils.AttentionLayer(heads=1)

Bases: Module

Custom attention layer for the DIPK model.

Parameters:: heads (int)

forward(molgnet_features, mask, gene_expression, bionic)

Forward pass of the attention layer.

Parameters:

molgnet_features (Tensor) – MolGNet features
mask (Tensor) – mask for the MolGNet features, as molecules have varying sizes (valid atom features are True)
gene_expression (Tensor) – gene expression features of the graph data
bionic (Tensor) – bionic network features of the graph data

Return type:

Tensor

Returns:

tensor of MolGNet features after attention layer

class drevalpy.models.DIPK.model_utils.DenseLayers(fc_layer_num, fc_layer_dim, dropout_rate)

Bases: Module

Custom dense layers for the DIPK model.

Parameters:

fc_layer_num (int)
fc_layer_dim (list[int])
dropout_rate (float)

forward(x, gene, bionic)

Forward pass of the dense layers.

Parameters:

x (Tensor) – output tensor from the attention layer
gene (Tensor) – gene expression features (GEF) of the graph data
bionic (Tensor) – biological network features (BNF) of the graph data

Return type:

Tensor

Returns:

output tensor after the dense layers

class drevalpy.models.DIPK.model_utils.Predictor(heads, fc_layer_num, fc_layer_dim, dropout_rate)

Bases: Module

Whole DIPK model.

Parameters:

heads (int)
fc_layer_num (int)
fc_layer_dim (list[int])
dropout_rate (float)

forward(molgnet_drug_features, gene_expression, bionic, molgnet_mask)

Forward pass of the DIPK model.

Parameters:

molgnet_drug_features (Tensor) – tensor of MolGNet features from graph data
gene_expression (Tensor) – gene expression features (GEF) of the graph data
bionic (Tensor) – biological network features (BNF) of the graph data
molgnet_mask (Tensor) – mask for the MolGNet features, as molecules have varying sizes

Return type:

Tensor

Returns:

output tensor of the DIPK model