Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik Harvard Georgia Tech MIT CMU Harvard kexinhuang@hsph.harvard.edu marinka@hms.harvard.edu tfu42@gatech.edu whgao@mit.edu zhaoy@cmu.edu
Retrieving, curating, and processing ML-ready datasets is time- consuming and requires extensive domain expertise. Datasets are scattered around the bio repositories and there is no centralized repository for a variety of therapeutics tasks. Many tasks are under-explored in AI/ML community because of the lack of data access. 2 https://github.com/mims-harvard/TDC
Machine Learning Datasets for Therapeutics Open-Source ML Datasets for Therapeutics: • Wide range of tasks: target discovery, activity screening, efficacy, safety, • manufacturing Wide range of products: small molecules, antibodies, vaccine, miRNA • Numerous Data Functions: • Extensive data functions and model evaluators • Data processing and splits, molecule generation oracles, and much more • 3 Lines of Code: • Minimum package dependency, lightweight loaders • 3 https://github.com/mims-harvard/TDC
Our Vision for TDC Identify meaningful Design powerful therapeutics tasks ML models ML Domain scientists scientists Advancing algorithms for key therapeutics problems 4 https://github.com/mims-harvard/TDC
Modular Structure of TDC TDC “Central Dogma” Single- Y instance Multi- Y instance Generation 5 https://github.com/mims-harvard/TDC
Diverse Coverage of Tasks 6 https://github.com/mims-harvard/TDC
GDA Tox DTI DrugRes Reaction HTS DrugSyn MolGen QM Peptide ADME MHC PairMolGen AntibodyAff Paratope RetroSyn MTI Epitope PPI Catalyst Develop Yields DDI 7 https://github.com/mims-harvard/TDC
3 Lines of Code The core TDC library uses minimum packages thus is installed hassle-free. Data loaders are simplified so that you can get access to ML- ready datasets within only 3 lines of code. 8 https://github.com/mims-harvard/TDC
Highlight: Data sources 9 https://github.com/mims-harvard/TDC
Highlight: Drug Response Prediction High Response DrugRes Low Response Drug Synergy Prediction High Response + DrugSyn Low Response 10 https://github.com/mims-harvard/TDC
Highlight: 10 Biologics Datasets Paratope Develop Epitope Peptide MHC MTI AntibodyAff 11 https://github.com/mims-harvard/TDC
Data Functions to Support your Research 12 https://github.com/mims-harvard/TDC
Molecule Generation Oracles Molecule Generation Oracle Score GuacaMol Generated Molecules Optimize 3 Lines of Code MOSES Literature GuacaMol: Benchmarking Models for de Novo Molecular Design, J. Chem. Inf. Model., 2019 13 MOSES: A Benchmarking Platform for Molecular Generation Models, Frontiers in Pharmacology, 2020 https://github.com/mims-harvard/TDC
You Are Invited to Join TDC! TDC is an Open-Source, Community Effort Contribute Tasks Datasets Data Functions HTS, Data Wrangling, Clinical Trials, ADME, Data Visualization, CRISPR, Drug Response, Realistic Splits, Phenotypic Drug Synergy, Molecule Screening, Reactions, Generation Protein Contact, Antibody affinity, Oracles, Crystal Structure ……. ……. ……. Fill in this form: rb.gy/ytbyfl 14 https://github.com/mims-harvard/TDC
zitnikl klab.hms.harvard.edu/TDC /TDC 15 https://github.com/mims-harvard/TDC
Website GitHub zitnikl klab.hms.harvard.edu/TDC /TDC git github.com om/mims mims-ha harva vard/TDC /TDC gr grou oups.io io/g/ /g/td tdc Kexin Huang Tianfan Fu Wenhao Gao Yue Zhao Marinka Zitnik @y @yzhao ao062 062 @marinka kazitnik @K @Kex exinHuan ang5 @Ti TianfanFu @W @Wen enhao aoGao ao1 Harvard Georgia Tech MIT CMU Harvard kexinhuang@hsph.harvard.edu tfu42@gatech.edu whgao@mit.edu zhaoy@cmu.edu marinka@hms.harvard.edu
Recommend
More recommend