DrEvalPy: Python Cancer Cell Line Drug Response Prediction Suite

PyPI Python Version License Read the documentation at https://drevalpy.readthedocs.io/ Build Package Status Run Tests Status Codecov pre-commit Black

Overview of the DrEval framework. Via input options, implemented state-of-the-art models can be compared against baselines of varying complexity. We address obstacles to progress in the field at each point in our pipeline: Our framework is available on PyPI and nf-core and we follow FAIReR standards for optimal reproducibility. DrEval is easily extendable as demonstrated here with an implementation of a proteomics-based random forest. Custom viability data can be preprocessed with CurveCurator, leading to more consistent data and metrics. DrEval supports five widely used datasets with application-aware train/test splits that enable detecting weak generalization. Models are free to use provided cell line- and drug features or custom ones. The pipeline supports randomization-based ablation studies and performs robust hyperparameter tuning for all models. Evaluation is conducted using meaningful, bias-resistant metrics to avoid inflated results from artifacts such as Simpson’s paradox. All results are compiled into an interactive HTML report.

Overview

Check out our preprint on bioRxiv!

Focus on Innovating Your Models — DrEval Handles the Rest!

  • DrEval is a toolkit that ensures drug response prediction evaluations are statistically sound, biologically meaningful, and reproducible.

  • Focus on model innovation while using our automated standardized evaluation protocols and preprocessing workflows.

  • A flexible model interface supports all model types (e.g. machine learning, statistical models, network-based analyses).

Use DrEval to build drug response models that have an impact

  1. Maintained, up-to-date baseline catalog, no need to re-implement literature models

  2. Gold standard datasets for benchmarking

  3. Consistent application-driven evaluation

  4. Ablation studies with permutation tests

  5. Cross-study evaluation for generalization analysis

  6. Optimized nextflow pipeline for fast experiments

  7. Easy-to-use hyperparameter tuning

  8. Paper-ready visualizations to display performance

This project is a collaboration of the Technical University of Munich (TUM, Germany) and the Freie Universität Berlin (FU, Germany).

Leaderboard

DrEvalPy Leaderboard

Quickstart

Make sure you have installed DrEvalPy and its dependencies (see Installation).

To make sure the pipeline runs, you can use the fast models NaiveMeanEffectsPredictor and NaiveDrugMeanPredictor on the TOYv1 (subset of CTRPv2) or TOYv2 (subset of GDSC2) dataset with the LCO test mode.

drevalpy --run_id my_first_run --models NaiveTissueMeanPredictor NaiveDrugMeanPredictor --baselines NaiveMeanEffectsPredictor --dataset TOYv1 --test_mode LCO

This will train the three baseline models to predict LN_IC50 values of our Toy dataset which is a subset of CTRPv2. It will evaluate in “LCO” which is the leave-cell-line-out splitting strategy (leave random cell lines out for testing) using 7 fold cross validation. The results will be stored in

results/my_first_run/TOYv1/LCO

You can visualize them using

drevalpy-report --run_id my_first_run --dataset TOYv1

This creates an index.html file which you can open in your browser to see the results of your run.

We recommend the use of our nextflow pipeline for computational demanding runs and for improved reproducibility. No knowledge of nextflow is required to run it. The nextflow pipeline is available on the nf-core GitHub, the documentation can be found here.

News

📄 Our preprint is out! 2025-05-29

Check out our preprint on bioRxiv!

Note

From Hype to Health Check: Critical Evaluation of Drug Response Prediction Models with DrEval. Judith Bernett, Pascal Iversen, Mario Picciani, Mathias Wilhelm, Katharina Baum, Markus List. bioRxiv 2025.05.26.655288; doi: https://doi.org/10.1101/2025.05.26.655288

🚀 We have launched the DrEval Challenge 🚀 2024-11-26

We think that it is possible to design a meaningful drug response prediction model that outperforms simple baseline models.

The DrEval challenge is simple:

1. Integrate your model into DrEval by following the Contributor Guide.
2. Compare your model to the baseline models provided in the DrEval package, either using the standalone or the Nextflow pipeline.
3. Let us know your results!
3.1. 🎊If you significantly outperform the RandomForest baseline model in the LCO setting or the GradientBoosting model in the LDO setting, we will personally send you chocolate or another snack of your choosing 🍫.
3.2. 🥺If you perform significantly worse than the NaiveDrugMeanPredictor, you will have to send us chocolate.

📜 Origin Story 💊 2023-11-20

Long ago, the people of science lived in harmony. Researchers collaborated, data flowed freely, and models were tested with integrity. But then, the H-Index Nation attacked. Suddenly, impact factors ruled all, flashy results overshadowed rigorous testing, and biased benchmarks spread like wildfire. Science, once a beacon of knowledge, became clouded by competition and questionable practices. Only a fair and unbiased framework could restore balance. And when the field needed it most—drevalpy was born. A framework designed to test drug response prediction models with fairness and transparency, cutting through bias and restoring the integrity of scientific evaluation. Though the fight against bad practices is long, with drevalpy, balance may yet be restored.