SRMF

SRMF Model

Contains the SRMF (Similarity Regularization Matrix Factorization) model.

Original publication: Wang, L., Li, X., Zhang, L. et al. Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC Cancer 17, 513 (2017). https://doi.org/10.1186/s12885-017-3500-5. Matlab code adapted from https://github.com/linwang1982/SRMF.

class drevalpy.models.SRMF.srmf.SRMF

Bases: DRPModel

SRMF model: Similarity Regularization Matrix Factorization.

The primary idea is to map m drugs and n cell lines into a shared latent space, with a low dimensionality K, where \(K << min (m, n)\). The properties of a drug \(d_i\) and a cell line \(c_j\) are described by two latent coordinates \(u_i\) and \(v_j\) (K dimensional row vectors), respectively. The drug response matrix Y is approximated by: \(min_{U,V} || W \cdot (Y - U \cdot V^T) ||^2_F + lambda_l \cdot (||U||^2_F + ||V||^2_F) + lambda_d cdot ||S_d - U \cdot U^T||^2_F + lambda_c \cdot ||S_c - V \cdot V^T||^2_F\) where W is a weight matrix (\(W_{ij} = 1 if Y_{ij}\) is a known response value, else 0). U, V contain \(u_i\), \(v_j\) as row vectors, respectively, \(||.||_F\) is the Frobenius norm. To avoid overfitting, L2 regularization is used. \(S_d, S_c\) are drug/cell line similarity matrices. Differences between two drugs/cell lines are minimized in latent space.

build_model(hyperparameters)

Initializes hyperparameters for SRMF model.

K is the latent dimensionality, lambda_l, lambda_d, lambda_c are regularization parameters, max_iter is the number of iterations, seed is the random seed.

Parameters:

hyperparameters (dict) – dictionary containing the hyperparameters

Return type:

None

cell_line_views = ['gene_expression']
drug_views = ['fingerprints']
classmethod get_model_name()

Returns the model name.

Return type:

str

Returns:

SRMF

classmethod load(directory)

Load a trained SRMF model from the specified directory.

Expects the following files:

  • best_u.pkl: latent factors for drugs

  • best_v.pkl: latent factors for cell lines

  • w_mask.pkl: response presence mask

  • config.json: model configuration (hyperparameters and training mean)

Parameters:

directory (str) – Directory containing the saved model artifacts

Return type:

SRMF

Returns:

An instance of SRMF with restored parameters

Raises:

FileNotFoundError – if any required file is missing

load_cell_line_features(data_path, dataset_name)

Loads the cell line features, in this case the gene expression features.

Parameters:
  • data_path (str) – Path to the gene expression and landmark genes, e.g., data/

  • dataset_name (str) – Name of the dataset, e.g., GDSC2

Return type:

FeatureDataset

Returns:

FeatureDataset containing the cell line gene expression features, filtered through the landmark genes

load_drug_features(data_path, dataset_name)

Loads the drug features, in this case the drug fingerprints.

Parameters:
  • data_path (str) – Path to the drug features, in this case the drug fingerprints, e.g., data/

  • dataset_name (str) – Name of the dataset, e.g., GDSC2

Return type:

FeatureDataset

Returns:

FeatureDataset containing the drug fingerprint features

predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)

Predicts the drug response based on the trained latent factors.

Parameters:
  • drug_ids (ndarray) – drug identifiers

  • cell_line_ids (ndarray) – cell line identifiers

  • cell_line_input (FeatureDataset) – not needed for prediction in SRMF

  • drug_input (FeatureDataset | None) – not needed for prediction in SRMF

Return type:

ndarray

Returns:

predicted response matrix

save(directory)

Save the SRMF model’s parameters and latent matrices to the specified directory.

Files saved:

  • best_u.pkl: latent factors for drugs

  • best_v.pkl: latent factors for cell lines

  • w_mask.pkl: response presence mask

  • config.json: model configuration (hyperparameters and training mean)

Parameters:

directory (str) – Target directory to store model artifacts

Return type:

None

train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')

Prepares data and trains the SRMF model.

Parameters:
Raises:

ValueError – if drug_input is None

Return type:

None