Open Source Code Bases

Some of our code libraries are made open source throughout the duration of the project, an overview is made below.

MELLODDY GitHub

MELLODDY-TUNER

ChEMBL25

MELLODY-TUNER is of interest to the general scientific community interested in large-scale privacy-preserving machine learning. MELLODDY-TUNER is a Phyton script that uses RD functionality, encompassing structure processing and LSH based test-train fold splitting code. Small data files (ChEMBL25) are included for unit testing.

Find out more here


SparseChem Library

This package provide fast and accurate machine learning models for biochemical applications. Especially, we support very high-dimensional models with sparse inputs, e.g., millions of features and millions of compounds.

Find out more here


ChemFold provides several methods for computing train-validation-test splits, designed for both ordinary ML and federated ML tasks involving small molecules. Following methods are included:

  • Random split

  • Sphere exclusion clustering based split

  • Locality sensitive hashing (LSH) based split

Scaffold trees

Find out more here

Chemfold


Federated performance evaluation workflow for classification and regression models.

Find out more here

Federated performance evaluation workflow


Work files for the preparation of public data used to test the federative learning pipelines.

Find out more here

Public data extraction


MELLODDY datasets

Collection of public datasets can be found here (they are also linked on MELLODDY Github.

MELLODDY Dataset


Substra Foundation Github

Substra framework

Substra framework is a low-layer tool, offering secure, traceable, distributed orchestration of machine learning tasks among partners. It aims at being compatible with privacy-enhancing technologies to complement their use to provide efficient and transparent privacy-preserving workflows for data science. Its ambition is to make new scientific and economic data science collaborations possible.

Find out more here