PharmaFormer

PharmaFormer Model

Contains PharmaFormer, a transformer-based deep learning model for drug response prediction.

A Transformer-based deep learning model designed to predict clinical drug responses by integrating gene expression profiles and drug molecular structures.

Original authors: Zhou et al. (2025, 10.1038/s41698-025-01082-6) Code adapted from their Github: https://github.com/zhouyuru1205/PharmaFormer

class drevalpy.models.PharmaFormer.pharmaformer.PharmaFormerModel

Bases: DRPModel

PharmaFormer model for drug response prediction.

build_model(hyperparameters)

Builds the PharmaFormer model with the specified hyperparameters.

Parameters:

hyperparameters (dict[str, Any]) – Model hyperparameters including gene_hidden_size, drug_hidden_size, feature_dim, nhead, num_layers, dim_feedforward, dropout, batch_size, lr, epochs, patience

Return type:

None

cell_line_views = ['gene_expression']
drug_views = ['bpe_smiles']
early_stopping = True
classmethod get_model_name()

Get the model name.

Return type:

str

Returns:

PharmaFormer

classmethod load(directory)

Load the PharmaFormer model using PyTorch conventions.

This method expects the following files in the given directory:

  • “pharmaformer_model.pt”: PyTorch state_dict of the model

  • “hyperparameters.json”: Dictionary of hyperparameters

  • “gene_scaler.pkl”: Fitted StandardScaler (optional)

  • “gene_normalizer.pkl”: Fitted MinMaxScaler (optional)

Parameters:

directory (str) – Path to the directory containing the model files

Return type:

PharmaFormerModel

Returns:

An instance of PharmaFormerModel with loaded model

load_cell_line_features(data_path, dataset_name)

Load cell line features.

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

cell line features

load_drug_features(data_path, dataset_name)

Load drug features (BPE-encoded SMILES).

Parameters:
  • data_path (str) – path to the data

  • dataset_name (str) – name of the dataset

Return type:

FeatureDataset

Returns:

drug features

Raises:

FileNotFoundError – if the BPE SMILES file is not found

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the response values for the given cell lines and drugs.

Parameters:
  • cell_line_ids (ndarray) – list of cell line IDs

  • drug_ids (ndarray) – list of drug IDs

  • cell_line_input (FeatureDataset) – input data associated with the cell line

  • drug_input (FeatureDataset | None) – input data associated with the drug

Return type:

ndarray

Returns:

predicted response values

Raises:

ValueError – if drug_input is None or if the model is not initialized

save(directory)

Save the PharmaFormer model using PyTorch conventions.

This method stores:

  • “pharmaformer_model.pt”: PyTorch state_dict of the model

  • “hyperparameters.json”: All hyperparameters

  • “gene_scaler.pkl”: Fitted StandardScaler for gene expression

  • “gene_normalizer.pkl”: Fitted MinMaxScaler for gene expression

Parameters:

directory (str) – Target directory where the model files will be saved

Raises:

ValueError – If model is not built

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Trains the model.

Parameters:
  • output (DrugResponseDataset) – training data associated with the response output

  • cell_line_input (FeatureDataset) – input data associated with the cell line

  • drug_input (FeatureDataset | None) – input data associated with the drug

  • output_earlystopping (DrugResponseDataset | None) – early stopping data associated with the response output

  • model_checkpoint_dir (str) – directory to save the model checkpoint

Raises:

ValueError – if drug_input is None or if early stopping data is missing

Return type:

None

Model utils

Neural network components for PharmaFormer model.

class drevalpy.models.PharmaFormer.model_utils.CombinedModel(gene_input_size, gene_hidden_size, drug_hidden_size, feature_dim, nhead, num_layers=3, dim_feedforward=2048, dropout=0.1)

Bases: Module

Combined model integrating feature extraction and transformer.

Parameters:
  • gene_input_size (int)

  • gene_hidden_size (int)

  • drug_hidden_size (int)

  • feature_dim (int)

  • nhead (int)

  • num_layers (int)

  • dim_feedforward (int)

  • dropout (float)

forward(gene_expr, smiles)

Forward pass of the combined model.

Parameters:
  • gene_expr (Tensor) – Gene expression features [batch_size, gene_input_size]

  • smiles (Tensor) – BPE-encoded SMILES features [batch_size, 128]

Return type:

Tensor

Returns:

Output predictions [batch_size, 1]

class drevalpy.models.PharmaFormer.model_utils.FeatureExtractor(gene_input_size, gene_hidden_size, drug_hidden_size)

Bases: Module

Feature extractor for gene expression and drug SMILES.

Parameters:
  • gene_input_size (int)

  • gene_hidden_size (int)

  • drug_hidden_size (int)

forward(gene_expr, smiles)

Forward pass of the feature extractor.

Parameters:
  • gene_expr (Tensor) – Gene expression features [batch_size, gene_input_size]

  • smiles (Tensor) – BPE-encoded SMILES features [batch_size, 128]

Return type:

Tensor

Returns:

Combined features [batch_size, gene_hidden_size + drug_hidden_size]

class drevalpy.models.PharmaFormer.model_utils.TransModel(feature_dim, nhead, seq_len, dim_feedforward=2048, dropout=0.1, num_layers=3)

Bases: Module

Transformer model for processing combined features.

Parameters:
  • feature_dim (int)

  • nhead (int)

  • seq_len (int)

  • dim_feedforward (int)

  • dropout (float)

  • num_layers (int)

forward(x)

Forward pass of the transformer model.

Parameters:

x (Tensor) – Input tensor [batch_size, seq_len, feature_dim]

Return type:

Tensor

Returns:

Output predictions [batch_size, 1]