DIPK

DIPK Model

DIPK model. Adapted from https://github.com/user15632/DIPK.

Original publication: Improving drug response prediction via integrating gene relationships with deep learning Pengyong Li, Zhengxiang Jiang, Tianxiao Liu, Xinyu Liu, Hui Qiao, Xiaojun Yao Briefings in Bioinformatics, Volume 25, Issue 3, May 2024, bbae153, https://doi.org/10.1093/bib/bbae153

class drevalpy.models.DIPK.dipk.DIPKModel

Bases: DRPModel

DIPK model. Adapted from https://github.com/user15632/DIPK.

build_model(hyperparameters)

Builds the DIPK model with the specified hyperparameters.

Parameters:

hyperparameters (dict[str, Any]) – embedding_dim, heads, fc_layer_num, fc_layer_dim, dropout_rate, epochs, batch_size, lr

Return type:

None

Details of hyperparameters:

  • embedding_dim: int, embedding dimension used for the graph encoder which is not used in the final model

  • heads: int, number of heads for the multi-head attention layer, defaults to 1

  • fc_layer_num: int, number of fully connected layers for the dense layers

  • fc_layer_dim: list[int], number of neurons for each fully connected layer

  • dropout_rate: float, dropout rate for all fully connected layers

  • epochs: int, number of epochs to train the model

  • batch_size: int, batch size for training

  • lr: float, learning rate for training

cell_line_views = ['gene_expression', 'bionic_features']
drug_views = ['molgnet_features']
early_stopping = True
classmethod get_model_name()

Get the model name.

Return type:

str

Returns:

DIPK

classmethod load(directory)

Load the DIPK model and gene expression encoder using PyTorch conventions.

This method expects the following files in the given directory:

  • “dipk_model.pt”: PyTorch state_dict of the DIPK predictor model

  • “gene_encoder.pt”: PyTorch state_dict of the gene expression encoder

  • “hyperparameters.json”: Dictionary of hyperparameters, must include “gene_encoder_input_dim”

Parameters:

directory (str) – Path to the directory containing the model files

Return type:

DIPKModel

Returns:

An instance of DIPK with loaded model and encoder

load_cell_line_features(data_path, dataset_name)

Load cell line features.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – path to the dataset

Return type:

FeatureDataset

Returns:

cell line features

load_drug_features(data_path, dataset_name)

Load drug features.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – path to the dataset

Return type:

FeatureDataset

Returns:

drug features

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response values for the given cell lines and drugs.

Parameters:
  • cell_line_ids (ndarray) – list of cell line IDs

  • drug_ids (ndarray) – list of drug IDs

  • cell_line_input (FeatureDataset) – input data associated with the cell line

  • drug_input (FeatureDataset | None) – input data associated with the drug

Return type:

ndarray

Returns:

predicted response values

Raises:

ValueError – if drug_input is None or if the model is not initialized or if the gene expression encoder is not initialized

save(directory)

Save the DIPK model and gene expression encoder using PyTorch conventions.

This method stores:

  • “dipk_model.pt”: PyTorch state_dict of the DIPK predictor model

  • “gene_encoder.pt”: PyTorch state_dict of the trained gene expression encoder

  • “hyperparameters.json”: All hyperparameters including encoder input_dim

Parameters:

directory (str) – Target directory where the model files will be saved

Raises:

ValueError – If model or encoder is not built

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model.

Parameters:
  • output (DrugResponseDataset) – training data associated with the response output

  • cell_line_input (FeatureDataset) – input data associated with the cell line

  • drug_input (FeatureDataset | None) – input data associated with the drug

  • output_earlystopping (DrugResponseDataset | None) – early stopping data associated with the response output

  • model_checkpoint_dir (str) – directory to save the model checkpoint

Raises:

ValueError – if drug_input is None or if the model is not initialized

Return type:

None

Attention utils

Contains a custom MultiHeadAttentionLayer for the DIPK model.

class drevalpy.models.DIPK.attention_utils.MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)

Bases: Module

Custom multi-head attention layer for the DIPK model.

Parameters:
forward(query, key, value, mask=None)

Forward pass of the multi-head attention layer.

Parameters:
  • query (Tensor) – query tensor

  • key (Tensor) – key tensor

  • value (Tensor) – value tensor

  • mask (Tensor | None) – mask tensor

Return type:

tuple[Tensor, Tensor]

Returns:

output tensor and attention tensor

Data utils

Includes functions to load and process the DIPK dataset.

  • get_data: Creates a list of dictionaries with drug and cell line features.

  • CollateFn: Class to collate the DataLoader batches.

  • DIPKDataset: Dataset class for the DIPK model.

class drevalpy.models.DIPK.data_utils.CollateFn(train=True)

Bases: object

Collate function for the DataLoader, either for training or testing.

class drevalpy.models.DIPK.data_utils.DIPKDataset(samples)

Bases: Dataset, ABC

Dataset of graphs from get_data.

drevalpy.models.DIPK.data_utils.get_data(cell_ids, drug_ids, cell_line_features, drug_features, ic50=None)

Prepare data samples for training or prediction.

Each sample includes:

  • Drug features (e.g., molecular embeddings).

  • Cell line features (gene expression and bionic_features).

  • Optional IC50 response values for supervised tasks.

Parameters:
  • cell_ids (ndarray) – IDs of the cell lines from the dataset.

  • drug_ids (ndarray) – IDs of the drugs from the dataset.

  • cell_line_features (FeatureDataset) – Input features associated with the cell lines.

  • drug_features (FeatureDataset) – Input features associated with the drugs.

  • ic50 (ndarray | None) – (Optional) Response values (e.g., IC50) to associate with samples.

Return type:

list

Returns:

List of dictionaries, each containing drug and cell line features, with optional IC50.

drevalpy.models.DIPK.data_utils.load_bionic_features(data_path, dataset_name, gene_add_num=512)

Load biological network (BIONIC) features for DIPK.

Parameters:
  • data_path (str) – Path to the data, e.g., “data/”

  • dataset_name (str) – Name of the dataset, e.g., GDSC2

  • gene_add_num (int) – Number of genes to add to the feature set

Return type:

FeatureDataset

Returns:

FeatureDataset with gene expression and biological network features

Gene expression encoder

Gene expression Autoencoder for DIPK model.

class drevalpy.models.DIPK.gene_expression_encoder.CollateFn

Bases: object

Collate function for the DataLoader, either for training or testing.

class drevalpy.models.DIPK.gene_expression_encoder.DataSet(data)

Bases: Dataset, ABC

Dataset class for gene expression data.

class drevalpy.models.DIPK.gene_expression_encoder.GeneExpressionDecoder(input_dim, latent_dim=512, h_dims=None, drop_out_rate=0.3)

Bases: Module

Gene expression decoder.

forward(embedding)

Forward pass of the gene expression decoder.

Parameters:

embedding – input data

Returns:

decoded data

class drevalpy.models.DIPK.gene_expression_encoder.GeneExpressionEncoder(input_dim, latent_dim=512, h_dims=None, drop_out_rate=0.3)

Bases: Module

Gene expression encoder.

Code adapted from the DIPK model https://github.com/user15632/DIPK.

forward(input)

Forward pass of the gene expression encoder.

Parameters:

input – input data

Returns:

encoded data

drevalpy.models.DIPK.gene_expression_encoder.encode_gene_expression(gene_expression_input, encoder)

Encode gene expression data.

Parameters:
Return type:

ndarray

Returns:

encoded gene expression data

drevalpy.models.DIPK.gene_expression_encoder.train_gene_expession_autoencoder(gene_expression_input, gene_expression_input_early_stopping, epochs_autoencoder=100)

Train the autoencoder model for gene expression data with early stopping.

Parameters:
  • gene_expression_input (ndarray) – gene expression data

  • gene_expression_input_early_stopping (ndarray) – validation data for early stopping

  • epochs_autoencoder (int) – number of epochs for training the autoencoder

Return type:

GeneExpressionEncoder

Returns:

trained encoder model

Model utils

Includes custom torch.nn.Modules for the DIPK model: AttentionLayer, DenseLayer, Predictor.

class drevalpy.models.DIPK.model_utils.AttentionLayer(heads=1)

Bases: Module

Custom attention layer for the DIPK model.

Parameters:

heads (int)

forward(molgnet_features, mask, gene_expression, bionic)

Forward pass of the attention layer.

Parameters:
  • molgnet_features (Tensor) – MolGNet features

  • mask (Tensor) – mask for the MolGNet features, as molecules have varying sizes (valid atom features are True)

  • gene_expression (Tensor) – gene expression features of the graph data

  • bionic (Tensor) – bionic network features of the graph data

Return type:

Tensor

Returns:

tensor of MolGNet features after attention layer

class drevalpy.models.DIPK.model_utils.DenseLayers(fc_layer_num, fc_layer_dim, dropout_rate)

Bases: Module

Custom dense layers for the DIPK model.

Parameters:
forward(x, gene, bionic)

Forward pass of the dense layers.

Parameters:
  • x (Tensor) – output tensor from the attention layer

  • gene (Tensor) – gene expression features (GEF) of the graph data

  • bionic (Tensor) – biological network features (BNF) of the graph data

Return type:

Tensor

Returns:

output tensor after the dense layers

class drevalpy.models.DIPK.model_utils.Predictor(heads, fc_layer_num, fc_layer_dim, dropout_rate)

Bases: Module

Whole DIPK model.

Parameters:
forward(molgnet_drug_features, gene_expression, bionic, molgnet_mask)

Forward pass of the DIPK model.

Parameters:
  • molgnet_drug_features (Tensor) – tensor of MolGNet features from graph data

  • gene_expression (Tensor) – gene expression features (GEF) of the graph data

  • bionic (Tensor) – biological network features (BNF) of the graph data

  • molgnet_mask (Tensor) – mask for the MolGNet features, as molecules have varying sizes

Return type:

Tensor

Returns:

output tensor of the DIPK model