Visualization

Outplot

Abstract wrapper class for all visualizations.

class drevalpy.visualization.outplot.OutPlot

Bases: ABC

Abstract wrapper class for all visualizations.

abstractmethod draw_and_save(out_prefix, out_suffix)

Draw and save the plot.

Parameters:

out_prefix (str) – path to output directory for python package
out_suffix (str) – custom suffix for output file

Return type:

None

abstractmethod static write_to_html(test_mode, f, *args, **kwargs)

Write the plot to the final report file.

Parameters:

test_mode (str) – LPO, LCO, LDO
f (TextIOWrapper) – the file to write to
args – additional arguments
kwargs – additional keyword arguments

Return type:

TextIOWrapper

Returns:

the file to write to

Comparison scatter plot

Contains the code needed to draw the correlation comparison scatter plot.

class drevalpy.visualization.comp_scatter.ComparisonScatter(df, color_by, test_mode, metric='R^2', algorithm='all')

Bases: OutPlot

Class to draw scatter plots for comparison of correlation metrics between models.

Produces two types of plots: an overall comparison plot and a dropdown plot for comparison between all models. If one model is consistently better than the other, the points deviate from the identity line (higher if the model is on the y-axis, lower if it is on the x-axis. The dropdown plot allows to select two models for comparison of their per-drug/per-cell-line pearson correlation. The overall plot facets all models and visualizes the density of the points.

Parameters:

df (DataFrame)
color_by (str)
test_mode (str)
metric (str)
algorithm (str)

draw_and_save(out_prefix, out_suffix)

Draws and saves the scatter plots.

Parameters:

out_prefix (str) – e.g., results/my_run/comp_scatter/
out_suffix (str) – should be self.name

Raises:

AssertionError – if out_suffix does not match self.name

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Inserts the generated files into the result HTML file.

Parameters:

test_mode (str) – test_mode, e.g., LCO
f (TextIOWrapper) – file to write to
args – unused
kwargs – used to get all files generated by create_report / the pipeline

Return type:

TextIOWrapper

Returns:

the file f

Critical difference plot

Draws the critical difference plot.

This method performs the following steps:

Friedman Test: First, it performs the Friedman test, which is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. It compares the ranks of multiple groups and is suitable when there are repeated measurements for each group (as is the case here with cross-validation splits). The p-value of this test is used to assess whether there are any significant differences in the performance of the models. We use Benjamini/Hochberg correction for multiple testing.
Post-hoc Conover Test: If the Friedman test returns a significant result (p-value < 0.05), the post-hoc Conover test can be used to identify pairs of algorithms that perform significantly different. This test is necessary because the Friedman test only tells if there is a difference somewhere among the models, but not which ones are different. The scikit_posthocs library is used for this step.
Rank Calculation: Next, the average ranks of each classifier across all cross-validation splits are computed. The models are ranked based on their performance (lower ranks indicate better performance) and the average rank across all splits is calculated for each model.
Critical Difference Diagram: Finally, the method draws the critical difference diagram. This diagram visually displays the significant differences between the algorithms. A horizontal line groups a set of models that are not significantly different. The critical difference is determined based on the post-hoc test results.

class drevalpy.visualization.critical_difference_plot.CriticalDifferencePlot(eval_results_preds, metric='MSE')

Bases: OutPlot

Draws the critical difference diagram.

The critical difference diagram is used to compare the performance of multiple classifiers and show whether a model is significantly better than another model. This is calculated over the average ranks of the classifiers which is why there need to be at least 3 classifiers to draw the diagram. Because the ranks are calculated over the cross-validation splits and the significance threshold is set to 0.05, e.g., 10 CV folds are advisable.

Parameters:: eval_results_preds (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draws the critical difference plot and saves it to a file.

Parameters:

out_prefix (str) – e.g., results/my_run/critical_difference_plots/
out_suffix (str) – e.g., LPO

Raises:

ValueError – if the figure is None or the test results are None

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Inserts the critical difference plot into the HTML report file.

Parameters:

test_mode (str) – test_mode, e.g., LPO
f (TextIOWrapper) – HTML report file
args – not needed
kwargs – not needed

Return type:

TextIOWrapper

Returns:

HTML report file

Cross study tables

Module for generating evaluation tables for cross-study drug response prediction.

class drevalpy.visualization.cross_study_tables.CrossStudyTables(evaluation_metrics, path_data)

Bases: object

Generate evaluation tables for cross-study drug response prediction.

Parameters:

evaluation_metrics (DataFrame)
path_data (Path)

draw(): Create and store Plotly table figures sorted by MSE.

draw_and_save(out_prefix, out_suffix)

Generate and save HTML tables for each cross-study dataset.

Parameters:

out_prefix (str) – Directory to save output files.
out_suffix (str) – Suffix to append to each output filename.

static write_to_html(test_mode, f, files, prefix)

Embed HTML table files into an open HTML file handle.

Parameters:

test_mode (str) – Substring to match filenames (e.g., ‘lpo’, ‘lco’).
f (TextIOWrapper) – Open writable file handle to insert HTML blocks.
files (list[str]) – List of filenames in the target directory.
prefix (str) – Path prefix to locate HTML table files.

Return type:

TextIOWrapper

Returns:

Updated file handle with HTML blocks written in.

Regression slider plot

Module for generating regression plots with a slider for Pearson correlation coefficient.

class drevalpy.visualization.regression_slider_plot.RegressionSliderPlot(df, test_mode, model, group_by='drug_name', normalize=False)

Bases: OutPlot

Generates regression plots with a slider for the Pearson correlation coefficient.

Parameters:

df (DataFrame)
test_mode (str)
model (str)
group_by (str)

draw_and_save(out_prefix, out_suffix)

Draw the regression plot and save it to a file.

Parameters:

out_prefix (str) – e.g., results/my_run/regression_plots/
out_suffix (str) – e.g., LPO_drug_SimpleNeuralNetwork

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Write the plot to the final report file.

Parameters:

test_mode (str) – test_mode, e.g., LPO
f (TextIOWrapper) – final report file
args – additional arguments
kwargs – additional keyword arguments, in this case all files

Return type:

TextIOWrapper

Returns:

the final report file

Violin and heatmap parent class

Parent class for Violin and Heatmap plots of performance measures over CV runs.

class drevalpy.visualization.vioheat.VioHeat(df, normalized_metrics=False, whole_name=False)

Bases: OutPlot

Parent class for Violin and Heatmap plots of performance measures over CV runs.

Parameters:: df (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draw and save the plot.

Parameters:

out_prefix (str) – e.g., results/my_run/heatmaps/
out_suffix (str) – e.g., algorithms_normalized

Return type:

None

static write_to_html(test_mode, f, *args, **kwargs)

Write the Violin and Heatmap plots into the result HTML file.

Parameters:

test_mode (str) – test_mode, e.g., LPO
f (TextIOWrapper) – result HTML file
args – additional arguments
kwargs – additional keyword arguments, in this case, the plot type and the files

Return type:

TextIOWrapper

Returns:

the result HTML file

Heatmap

Plots a heatmap of the evaluation metrics.

class drevalpy.visualization.heatmap.Heatmap(df, normalized_metrics=False, whole_name=False)

Bases: VioHeat

Plots a heatmap of the evaluation metrics.

Parameters:: df (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draw the heatmap and save it to a file.

Parameters:

out_prefix (str) – e.g., results/my_run/heatmaps/
out_suffix (str) – e.g., algorithms_normalized

Return type:

None

Violin plot

Plots a violin plot of the evaluation metrics.

class drevalpy.visualization.violin.Violin(df, normalized_metrics=False, whole_name=False)

Bases: VioHeat

Plots a violin plot of the evaluation metrics.

Parameters:: df (DataFrame)

draw_and_save(out_prefix, out_suffix)

Draw the violin and save it to a file.

Parameters:

out_prefix (str) – e.g., results/my_run/violin_plots/
out_suffix (str) – e.g., algorithms_normalized

Return type:

None

Utility functions

Utility functions for the visualization part of the package.

drevalpy.visualization.utils.compute_evaluation(df, return_df, group_by, model)

Compute the evaluation metrics per group.

Parameters:

df (DataFrame) – true vs. predicted values with mean_y_true_per_{group_by} column
return_df (DataFrame | None) – DataFrame to store the results
group_by (str) – either cell line or drug
model (str) – model name

Return type:

DataFrame

Returns:

dataframe with the evaluation results per group

drevalpy.visualization.utils.create_html(run_id, test_mode, files, prefix_results)

Create the html file for the given test mode, e.g., LPO.html.

Parameters:

run_id (str) – custom id for the results, e.g., my_run
test_mode (str) – test mode, e.g., LPO
files (list) – list of files in the results directory
prefix_results (str) – path to the results directory, e.g., results/my_run

Return type:

None

drevalpy.visualization.utils.create_index_html(custom_id, test_modes, prefix_results)

Create the index.html file.

Parameters:

custom_id (str) – custom id for the results, e.g., my_run
test_modes (list[str]) – list of test modes, e.g., [“LPO”, “LCO”, “LDO”]
prefix_results (str) – path to the results directory, e.g., results/my_run

Return type:

None

drevalpy.visualization.utils.create_output_directories(result_path, custom_id)

If they do not exist yet, make directories for the visualization files.

Parameters:

result_path (Path) – path to the results
custom_id (str) – run id passed via command line

Return type:

None

drevalpy.visualization.utils.draw_algorithm_plots(model, ev_res, ev_res_per_drug, ev_res_per_cell_line, t_vs_p, test_mode, custom_id, result_path)

Draw all plots for a specific algorithm.

Parameters:

model (str) – name of the model/algorithm
ev_res (DataFrame) – overall evaluation results
ev_res_per_drug (DataFrame | None) – evaluation results per drug
ev_res_per_cell_line (DataFrame | None) – evaluation results per cell line
t_vs_p (DataFrame) – true response values vs. predicted response values
test_mode (str) – test_mode
custom_id (str) – run id passed via command line
result_path (Path) – path to the results

Return type:

None

drevalpy.visualization.utils.draw_test_mode_plots(test_mode, ev_res, ev_res_per_drug, ev_res_per_cell_line, custom_id, path_data, result_path)

Draw all plots for a specific test_mode (LPO, LCO, LDO, LTO).

Parameters:

test_mode (str) – test_mode
ev_res (DataFrame) – overall evaluation results
ev_res_per_drug (DataFrame | None) – evaluation results per drug
ev_res_per_cell_line (DataFrame | None) – evaluation results per cell line
custom_id (str) – run id passed via command line
path_data (Path) – path to the data
result_path (Path) – path to the results

Return type:

ndarray

Returns:

list of unique algorithms

Raises:

ValueError – if no evaluation results are found for the given test_mode

drevalpy.visualization.utils.evaluate_file(pred_file, test_mode, model_name, dataset_name='NO_DATASET_NAME')

Evaluate the predictions from the final models.

Parameters:

pred_file (Path) – path to the prediction file
test_mode (str) – test mode, e.g., LPO
model_name (str) – model name, e.g., SimpleNeuralNetwork
dataset_name (str) – name of the dataset, e.g., GDSC2

Return type:

tuple[DataFrame, DataFrame | None, DataFrame | None, DataFrame, str]

Returns:

evaluation results, evaluation results per drug, evaluation results per cell line, true vs. predicted values, and model name

drevalpy.visualization.utils.parse_results(path_to_results, dataset)

Parse the results from the given directory.

Parameters:

path_to_results (str) – path to the results directory
dataset (str) – dataset name, e.g., GDSC2

Return type:

tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

evaluation results, evaluation results per drug, evaluation results per cell line, and true vs. predicted values

drevalpy.visualization.utils.prep_results(eval_results, eval_results_per_drug, eval_results_per_cell_line, t_vs_p, path_data)

Prepare the results by introducing new columns for algorithm, randomization, test_mode, split, CV_split.

Parameters:

eval_results (DataFrame) – evaluation results
eval_results_per_drug (DataFrame) – evaluation results per drug
eval_results_per_cell_line (DataFrame) – evaluation results per cell line
t_vs_p (DataFrame) – true vs. predicted values
path_data (Path) – path to the data

Return type:

tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

the same dataframes with new columns

Raises:

ValueError – if NaiveMeanEffectsPredictor is not found in the evaluation results

drevalpy.visualization.utils.write_results(path_out, eval_results, eval_results_per_drug, eval_results_per_cl, t_vs_p)

Write the results to csv files.

Parameters:

path_out (str) – path to the output directory, e.g., results/my_run/
eval_results (DataFrame) – evaluation results
eval_results_per_drug (DataFrame) – evaluation results per drug
eval_results_per_cl (DataFrame) – evaluation results per cell line
t_vs_p (DataFrame) – true vs. predicted values

Return type:

None