SRMF
SRMF Model
Contains the SRMF (Similarity Regularization Matrix Factorization) model.
Original publication: Wang, L., Li, X., Zhang, L. et al. Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC Cancer 17, 513 (2017). https://doi.org/10.1186/s12885-017-3500-5. Matlab code adapted from https://github.com/linwang1982/SRMF.
- class drevalpy.models.SRMF.srmf.SRMF
Bases:
DRPModelSRMF model: Similarity Regularization Matrix Factorization.
The primary idea is to map m drugs and n cell lines into a shared latent space, with a low dimensionality K, where \(K << min (m, n)\). The properties of a drug \(d_i\) and a cell line \(c_j\) are described by two latent coordinates \(u_i\) and \(v_j\) (K dimensional row vectors), respectively. The drug response matrix Y is approximated by: \(min_{U,V} || W \cdot (Y - U \cdot V^T) ||^2_F + lambda_l \cdot (||U||^2_F + ||V||^2_F) + lambda_d cdot ||S_d - U \cdot U^T||^2_F + lambda_c \cdot ||S_c - V \cdot V^T||^2_F\) where W is a weight matrix (\(W_{ij} = 1 if Y_{ij}\) is a known response value, else 0). U, V contain \(u_i\), \(v_j\) as row vectors, respectively, \(||.||_F\) is the Frobenius norm. To avoid overfitting, L2 regularization is used. \(S_d, S_c\) are drug/cell line similarity matrices. Differences between two drugs/cell lines are minimized in latent space.
- build_model(hyperparameters)
Initializes hyperparameters for SRMF model.
K is the latent dimensionality, lambda_l, lambda_d, lambda_c are regularization parameters, max_iter is the number of iterations, seed is the random seed.
- cell_line_views = ['gene_expression']
- drug_views = ['fingerprints']
- classmethod load(directory)
Load a trained SRMF model from the specified directory.
Expects the following files:
best_u.pkl: latent factors for drugs
best_v.pkl: latent factors for cell lines
w_mask.pkl: response presence mask
config.json: model configuration (hyperparameters and training mean)
- Parameters:
directory (
str) – Directory containing the saved model artifacts- Return type:
- Returns:
An instance of SRMF with restored parameters
- Raises:
FileNotFoundError – if any required file is missing
- load_cell_line_features(data_path, dataset_name)
Loads the cell line features, in this case the gene expression features.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the cell line gene expression features, filtered through the landmark genes
- load_drug_features(data_path, dataset_name)
Loads the drug features, in this case the drug fingerprints.
- Parameters:
- Return type:
- Returns:
FeatureDataset containing the drug fingerprint features
- predict(cell_line_ids, drug_ids, cell_line_input, drug_input=None)
Predicts the drug response based on the trained latent factors.
- Parameters:
drug_ids (
ndarray) – drug identifierscell_line_ids (
ndarray) – cell line identifierscell_line_input (
FeatureDataset) – not needed for prediction in SRMFdrug_input (
FeatureDataset|None) – not needed for prediction in SRMF
- Return type:
- Returns:
predicted response matrix
- save(directory)
Save the SRMF model’s parameters and latent matrices to the specified directory.
Files saved:
best_u.pkl: latent factors for drugs
best_v.pkl: latent factors for cell lines
w_mask.pkl: response presence mask
config.json: model configuration (hyperparameters and training mean)
- train(output, cell_line_input, drug_input=None, output_earlystopping=None, model_checkpoint_dir='checkpoints')
Prepares data and trains the SRMF model.
- Parameters:
output (
DrugResponseDataset) – response datacell_line_input (
FeatureDataset) – feature data for cell linesdrug_input (
FeatureDataset|None) – feature data for drugsoutput_earlystopping (
DrugResponseDataset|None) – optional early stopping dataset, not used in SRMFmodel_checkpoint_dir (
str) – directory to save the model checkpoints, not used in SRMF
- Raises:
ValueError – if drug_input is None
- Return type: