Visualization

Outplot

Abstract wrapper class for all visualizations.

class drevalpy.visualization.outplot.OutPlot

Bases: ABC

Abstract wrapper class for all visualizations.

abstractmethod draw_and_save(out_prefix, out_suffix)

Draw and save the plot.

Parameters:
  • out_prefix (str) – path to output directory for python package

  • out_suffix (str) – custom suffix for output file

Return type:

None

abstractmethod static write_to_html(test_mode, f, *args, **kwargs)

Write the plot to the final report file.

Parameters:
  • test_mode (str) – LPO, LCO, LDO

  • f (TextIOWrapper) – the file to write to

  • args – additional arguments

  • kwargs – additional keyword arguments

Return type:

TextIOWrapper

Returns:

the file to write to

Comparison scatter plot

Contains the code needed to draw the correlation comparison scatter plot.

class drevalpy.visualization.comp_scatter.ComparisonScatter(df, color_by, test_mode, metric='R^2', algorithm='all')

Bases: OutPlot

Class to draw scatter plots for comparison of correlation metrics between models.

Produces two types of plots: an overall comparison plot and a dropdown plot for comparison between all models. If one model is consistently better than the other, the points deviate from the identity line (higher if the model is on the y-axis, lower if it is on the x-axis. The dropdown plot allows to select two models for comparison of their per-drug/per-cell-line pearson correlation. The overall plot facets all models and visualizes the density of the points.

Parameters:
  • df (DataFrame)

  • color_by (str)

  • test_mode (str)

  • metric (str)

  • algorithm (str)

draw_and_save(out_prefix, out_suffix)

Draws and saves the scatter plots.

Parameters:
  • out_prefix (str) – e.g., results/my_run/comp_scatter/

  • out_suffix (str) – should be self.name

Raises:

AssertionError – if out_suffix does not match self.name

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Inserts the generated files into the result HTML file.

Parameters:
  • test_mode (str) – test_mode, e.g., LCO

  • f (TextIOWrapper) – file to write to

  • args – unused

  • kwargs – used to get all files generated by create_report / the pipeline

Return type:

TextIOWrapper

Returns:

the file f

Critical difference plot

Draws the critical difference plot.

This method performs the following steps:

  1. Friedman Test: First, it performs the Friedman test, which is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. It compares the ranks of multiple groups and is suitable when there are repeated measurements for each group (as is the case here with cross-validation splits). The p-value of this test is used to assess whether there are any significant differences in the performance of the models. We use Benjamini/Hochberg correction for multiple testing.

  2. Post-hoc Conover Test: If the Friedman test returns a significant result (p-value < 0.05), the post-hoc Conover test can be used to identify pairs of algorithms that perform significantly different. This test is necessary because the Friedman test only tells if there is a difference somewhere among the models, but not which ones are different. The scikit_posthocs library is used for this step.

  3. Rank Calculation: Next, the average ranks of each classifier across all cross-validation splits are computed. The models are ranked based on their performance (lower ranks indicate better performance) and the average rank across all splits is calculated for each model.

  4. Critical Difference Diagram: Finally, the method draws the critical difference diagram. This diagram visually displays the significant differences between the algorithms. A horizontal line groups a set of models that are not significantly different. The critical difference is determined based on the post-hoc test results.

class drevalpy.visualization.critical_difference_plot.CriticalDifferencePlot(eval_results_preds, metric='MSE')

Bases: OutPlot

Draws the critical difference diagram.

The critical difference diagram is used to compare the performance of multiple classifiers and show whether a model is significantly better than another model. This is calculated over the average ranks of the classifiers which is why there need to be at least 3 classifiers to draw the diagram. Because the ranks are calculated over the cross-validation splits and the significance threshold is set to 0.05, e.g., 10 CV folds are advisable.

Parameters:

eval_results_preds (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draws the critical difference plot and saves it to a file.

Parameters:
  • out_prefix (str) – e.g., results/my_run/critical_difference_plots/

  • out_suffix (str) – e.g., LPO

Raises:

ValueError – if the figure is None or the test results are None

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Inserts the critical difference plot into the HTML report file.

Parameters:
  • test_mode (str) – test_mode, e.g., LPO

  • f (TextIOWrapper) – HTML report file

  • args – not needed

  • kwargs – not needed

Return type:

TextIOWrapper

Returns:

HTML report file

Cross study tables

Module for generating evaluation tables for cross-study drug response prediction.

class drevalpy.visualization.cross_study_tables.CrossStudyTables(evaluation_metrics, path_data)

Bases: object

Generate evaluation tables for cross-study drug response prediction.

Parameters:
  • evaluation_metrics (DataFrame)

  • path_data (Path)

draw()

Create and store Plotly table figures sorted by MSE.

draw_and_save(out_prefix, out_suffix)

Generate and save HTML tables for each cross-study dataset.

Parameters:
  • out_prefix (str) – Directory to save output files.

  • out_suffix (str) – Suffix to append to each output filename.

static write_to_html(test_mode, f, files, prefix)

Embed HTML table files into an open HTML file handle.

Parameters:
  • test_mode (str) – Substring to match filenames (e.g., ‘lpo’, ‘lco’).

  • f (TextIOWrapper) – Open writable file handle to insert HTML blocks.

  • files (list[str]) – List of filenames in the target directory.

  • prefix (str) – Path prefix to locate HTML table files.

Return type:

TextIOWrapper

Returns:

Updated file handle with HTML blocks written in.

Regression slider plot

Module for generating regression plots with a slider for Pearson correlation coefficient.

class drevalpy.visualization.regression_slider_plot.RegressionSliderPlot(df, test_mode, model, group_by='drug_name', normalize=False)

Bases: OutPlot

Generates regression plots with a slider for the Pearson correlation coefficient.

Parameters:
  • df (DataFrame)

  • test_mode (str)

  • model (str)

  • group_by (str)

draw_and_save(out_prefix, out_suffix)

Draw the regression plot and save it to a file.

Parameters:
  • out_prefix (str) – e.g., results/my_run/regression_plots/

  • out_suffix (str) – e.g., LPO_drug_SimpleNeuralNetwork

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Write the plot to the final report file.

Parameters:
  • test_mode (str) – test_mode, e.g., LPO

  • f (TextIOWrapper) – final report file

  • args – additional arguments

  • kwargs – additional keyword arguments, in this case all files

Return type:

TextIOWrapper

Returns:

the final report file

Violin and heatmap parent class

Parent class for Violin and Heatmap plots of performance measures over CV runs.

class drevalpy.visualization.vioheat.VioHeat(df, normalized_metrics=False, whole_name=False)

Bases: OutPlot

Parent class for Violin and Heatmap plots of performance measures over CV runs.

Parameters:

df (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draw and save the plot.

Parameters:
  • out_prefix (str) – e.g., results/my_run/heatmaps/

  • out_suffix (str) – e.g., algorithms_normalized

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Write the Violin and Heatmap plots into the result HTML file.

Parameters:
  • test_mode (str) – test_mode, e.g., LPO

  • f (TextIOWrapper) – result HTML file

  • args – additional arguments

  • kwargs – additional keyword arguments, in this case, the plot type and the files

Return type:

TextIOWrapper

Returns:

the result HTML file

Heatmap

Plots a heatmap of the evaluation metrics.

class drevalpy.visualization.heatmap.Heatmap(df, normalized_metrics=False, whole_name=False)

Bases: VioHeat

Plots a heatmap of the evaluation metrics.

Parameters:

df (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draw the heatmap and save it to a file.

Parameters:
  • out_prefix (str) – e.g., results/my_run/heatmaps/

  • out_suffix (str) – e.g., algorithms_normalized

Return type:

None

Violin plot

Plots a violin plot of the evaluation metrics.

class drevalpy.visualization.violin.Violin(df, normalized_metrics=False, whole_name=False)

Bases: VioHeat

Plots a violin plot of the evaluation metrics.

Parameters:

df (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draw the violin and save it to a file.

Parameters:
  • out_prefix (str) – e.g., results/my_run/violin_plots/

  • out_suffix (str) – e.g., algorithms_normalized

Return type:

None

Utility functions

Utility functions for the visualization part of the package.

drevalpy.visualization.utils.compute_evaluation(df, return_df, group_by, model)

Compute the evaluation metrics per group.

Parameters:
  • df (DataFrame) – true vs. predicted values with mean_y_true_per_{group_by} column

  • return_df (DataFrame | None) – DataFrame to store the results

  • group_by (str) – either cell line or drug

  • model (str) – model name

Return type:

DataFrame

Returns:

dataframe with the evaluation results per group

drevalpy.visualization.utils.create_html(run_id, test_mode, files, prefix_results)

Create the html file for the given test mode, e.g., LPO.html.

Parameters:
  • run_id (str) – custom id for the results, e.g., my_run

  • test_mode (str) – test mode, e.g., LPO

  • files (list) – list of files in the results directory

  • prefix_results (str) – path to the results directory, e.g., results/my_run

Return type:

None

drevalpy.visualization.utils.create_index_html(custom_id, test_modes, prefix_results)

Create the index.html file.

Parameters:
  • custom_id (str) – custom id for the results, e.g., my_run

  • test_modes (list[str]) – list of test modes, e.g., [“LPO”, “LCO”, “LDO”]

  • prefix_results (str) – path to the results directory, e.g., results/my_run

Return type:

None

drevalpy.visualization.utils.create_output_directories(result_path, custom_id)

If they do not exist yet, make directories for the visualization files.

Parameters:
  • result_path (Path) – path to the results

  • custom_id (str) – run id passed via command line

Return type:

None

drevalpy.visualization.utils.draw_algorithm_plots(model, ev_res, ev_res_per_drug, ev_res_per_cell_line, t_vs_p, test_mode, custom_id, result_path)

Draw all plots for a specific algorithm.

Parameters:
  • model (str) – name of the model/algorithm

  • ev_res (DataFrame) – overall evaluation results

  • ev_res_per_drug (DataFrame | None) – evaluation results per drug

  • ev_res_per_cell_line (DataFrame | None) – evaluation results per cell line

  • t_vs_p (DataFrame) – true response values vs. predicted response values

  • test_mode (str) – test_mode

  • custom_id (str) – run id passed via command line

  • result_path (Path) – path to the results

Return type:

None

drevalpy.visualization.utils.draw_test_mode_plots(test_mode, ev_res, ev_res_per_drug, ev_res_per_cell_line, custom_id, path_data, result_path)

Draw all plots for a specific test_mode (LPO, LCO, LDO, LTO).

Parameters:
  • test_mode (str) – test_mode

  • ev_res (DataFrame) – overall evaluation results

  • ev_res_per_drug (DataFrame | None) – evaluation results per drug

  • ev_res_per_cell_line (DataFrame | None) – evaluation results per cell line

  • custom_id (str) – run id passed via command line

  • path_data (Path) – path to the data

  • result_path (Path) – path to the results

Return type:

ndarray

Returns:

list of unique algorithms

Raises:

ValueError – if no evaluation results are found for the given test_mode

drevalpy.visualization.utils.evaluate_file(pred_file, test_mode, model_name, dataset_name='NO_DATASET_NAME')

Evaluate the predictions from the final models.

Parameters:
  • pred_file (Path) – path to the prediction file

  • test_mode (str) – test mode, e.g., LPO

  • model_name (str) – model name, e.g., SimpleNeuralNetwork

  • dataset_name (str) – name of the dataset, e.g., GDSC2

Return type:

tuple[DataFrame, DataFrame | None, DataFrame | None, DataFrame, str]

Returns:

evaluation results, evaluation results per drug, evaluation results per cell line, true vs. predicted values, and model name

drevalpy.visualization.utils.parse_results(path_to_results, dataset)

Parse the results from the given directory.

Parameters:
  • path_to_results (str) – path to the results directory

  • dataset (str) – dataset name, e.g., GDSC2

Return type:

tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

evaluation results, evaluation results per drug, evaluation results per cell line, and true vs. predicted values

drevalpy.visualization.utils.prep_results(eval_results, eval_results_per_drug, eval_results_per_cell_line, t_vs_p, path_data)

Prepare the results by introducing new columns for algorithm, randomization, test_mode, split, CV_split.

Parameters:
  • eval_results (DataFrame) – evaluation results

  • eval_results_per_drug (DataFrame) – evaluation results per drug

  • eval_results_per_cell_line (DataFrame) – evaluation results per cell line

  • t_vs_p (DataFrame) – true vs. predicted values

  • path_data (Path) – path to the data

Return type:

tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

the same dataframes with new columns

Raises:

ValueError – if NaiveMeanEffectsPredictor is not found in the evaluation results

drevalpy.visualization.utils.write_results(path_out, eval_results, eval_results_per_drug, eval_results_per_cl, t_vs_p)

Write the results to csv files.

Parameters:
  • path_out (str) – path to the output directory, e.g., results/my_run/

  • eval_results (DataFrame) – evaluation results

  • eval_results_per_drug (DataFrame) – evaluation results per drug

  • eval_results_per_cl (DataFrame) – evaluation results per cell line

  • t_vs_p (DataFrame) – true vs. predicted values

Return type:

None