MELLODDY announces first demonstration of federated learning improved model performance in drug discovery 

IMI-funded project leverages machine learning to mine proprietary data among pharma companies to increase efficiencies in drug discovery, while maintaining privacy 

September 27th, 2021 – MELLODDY (Machine Learning Ledger Orchestration for Drug Discovery), an Innovative Medicines Initiative 2 Joint Undertaking (IMI2 JU) consortium of 10 pharmaceutical companies (Amgen; Astellas; AstraZeneca; Bayer; Boehringer Ingelheim; GSK; Institut De Recherches Servier; Janssen Pharmaceutica NV; Merck KGaA; and Novartis), five technology companies (Iktos; Kubermatic; NVIDIA; Owkin; and Substra Foundation), and two academic partners (Budapesti Muszaki Es Gazdasagtudomanyi Egyetem; KU Leuven), today reported it has achieved its mid-project objective of demonstrating improved model performance via federated learning enabled collaboration. 

Most, if not all, industries have come to understand the value of data and the potential of artificial intelligence (AI) to maximize its value. Following this trend, the pharmaceutical industry leverages machine learning (ML) to internally mine its proprietary data and is now eager to explore alternative data sources to further improve its ML models. Training ML models on competitive data sources presents a prime opportunity due to the quality and quantity of these data pools. However, cross-company, collaboration in the core competitive space is only conceivable when the privacy of the companies’ highly proprietary data is guaranteed. 

MELLODDY: Enabling collaboration among competitors – “coopetition”

A little over two years ago, the MELLODDY consortium launched a bold, privacy-preserving federated learning (FL) research experiment. Over the course of three years, its pharmaceutical partners are involving the largest part of their data warehouses and engaging in a series of three federated runs deploying the technology developed in collaboration with its technology and academic partners. MELLODDY tests the hypothesis that federated ML approaches can overcome data sharing challenges and privacy concerns for competitive partners with a mutual interest in building predictive models.

MELLODDY’s first year results presented a hallmark in privacy-preserving FL. Never had federated learning been applied to drug discovery at this scale. The 10 pharma partners were able to simultaneously train a common predictive model and, in doing so, harness their collective knowledge without compromising privacy. The first project year yielded a technical demonstration of advancements in AI with the successful operation of a rigorously audited platform comprising three indispensable layers: cloud infrastructure, application, and algorithm.  

Project Objectives: Increasing efficiencies in drug discovery

Following its year one success, MELLODDY’s remaining objectives are two-fold: improving the predictive multi-partner ML models to support drug discovery and development opportunities and exploring sustainable operations post-project. The consortium is targeting at least a one percent increase of multi-partner performance improvement over single partner performance improvement, measured using the appropriate standard practice metrics as an average across the 100.000+ ML learning tasks representing 40.000+ concentration-response assays. In addition, MELLODDY is aiming for a multiple percentage improvement for subset data groups like ADME (absorption, distribution, metabolism, excretion). Equally important and more representative of real-world application is the domain of applicability (AD). An extension of the AD for a single partner, enabled by the federated effort, means that the model can support navigation of a broader chemical space previously unknown to that partner. In drug discovery, an increased AD is known to increase the quality of predictions. Multiple percentage improvement for measurements of AD, therefore, pose another end-target for the MELLODDY consortium. 

Mid-term project achievements

MELLODDY’s mid-term results strongly supports the project’s working hypothesis of superior prediction quality and/or applicability domain of the common predictive drug discovery model to the single-partner modelling effort. 

Having successfully completed a second federated run at scale using an improved and re-audited platform, the consortium has achieved significant improvement of multi-partner models over single partner models and is midway to achieving the objective by project end (Figure 1). Promising observations were also made for the domain of applicability. In-depth analyses will follow to gain a better understanding, though single-partner data and associated neural network complexity are believed to be a factor. The platform enabling this scientific leap guaranteed the privacy and security of the highly proprietary drug discovery data of the 10 pharmaceutical partners on AWS cloud infrastructure spanning the three months of activity, transferring 713,796 GB of data and 912,778 EC2 hours.  

“In demonstrating federated multi-task learning across more than 100,000 machine learning tasks representing more than 40,000 concentration response assays we are excited to see early evidence that it indeed boosts the predictive performance and chemical applicability of models used to inform drug discovery programs” says Hugo Ceulemans, Scientific Director, Janssen Pharmaceutica, NV and MELLODDY Project Leader. The results announced today will be further elaborated in pending scientific publications and conferences and mark another milestone achievement for the MELLODDY project. 


 
Figure 1. MELLODDY achieved its mid-project objective of demonstrating improved model performance via federated learning enabled collaboration in drug discovery. For the 10 pharmaceutical partners, the multi-partner over single-partner delta improvement is shown as a percentage for (i) the predictive performance measured as AUC-PR for classification (dark blue), (ii) the extension of the domain of applicability as the delta median conformal prediction efficiency for classification (light blue), and (iii) the correction coefficient for regression (green). The target delta improvement is shown as a solid line (1% AUC-PR).

Figure 1. MELLODDY achieved its mid-project objective of demonstrating improved model performance via federated learning enabled collaboration in drug discovery.

For the 10 pharmaceutical partners, the multi-partner over single-partner delta improvement is shown as a percentage for (i) the predictive performance measured as AUC-PR for classification (dark blue), (ii) the extension of the domain of applicability as the delta median conformal prediction efficiency for classification (light blue), and (iii) the correction coefficient for regression (green). The target delta improvement is shown as a solid line (1% AUC-PR).

 

What is next for MELLODDY?

MELLODDY’s year two achievement demonstrating the early benefits of modelling at scale across tasks, data types and partners, puts the consortium on track to achieving its ultimate project objective of reaching the one percent delta improvement for each partner. For this, several strategies are being explored in anticipation of the consortium’s final federated run in 2022. 

On post-project opportunities, Mathieu Galtier, Chief Product Officer at Owkin and MELLODDY coordinator, said “we are working to turn the solution developed under this public-private partnership into a commercial service so that customer consortia can collectively train foundation models not only for small molecules but also for other applications in drug discovery and development. As we near the end of the project, the consortium is open to accepting new partners for post-project collaboration.”

Learn more: 

Contact

Janssen Pharmaceutica NV, Kim Rotondo (krotondo@its.jnj.com)                            

Owkin, Darius Meadon (darius.meadon@owkin.com)

Acknowledgement

This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 831472. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. 

   

Disclaimer

This communication reflects the views of the authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the information contained herein.