CONSIDERATIONS FOR CANCER RESEARCH DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute
Outline ▪ Introduction to Cancer Research Data Commons ▪ Considerations for shifting the proteomics research community to the cloud
Integrating across data types Clark, DJ et al. Cell , 2019 , 179 , 964-983
The Beau Biden Cancer Moonshot sm The Beau Biden Cancer Moonshot sm Blue Ribbon Panel – Overarching goals • Accelerate progress in cancer, including October, 2016 prevention & screening Recommendations include: • From cutting edge basic research to • Build a National Cancer Data Ecosystem wider uptake of standard of care • Enhanced cloud-computing platforms • Encourage greater cooperation and • Services that link disparate collaboration information, including clinical, image, • Within and between academia, and molecular data government, and private sector • Essential underlying data science • Enhance data sharing infrastructure, standards, methods, and portals for the Cancer Data Ecosystem 4
Components: Data Sources: • Data Nodes • Data Commons Framework Canine Immuno-oncology • Data Aggregators studies • Cloud Resources Clinical Proteomics Tumor Analysis Consortium* • APIs TCIA • Elastic compute The Cancer Imaging Archive* resources • Portals • Workspaces Data • Analytic Tools Scientists • Tool repositories
Data Commons Framework powered by Gen3 (U of Chicago) Set of modular Reusable, expandable Core principles and components that can framework for a Data structures for a Data be leveraged across Commons Commons Data Commons IndexD Fence Centralized Authentication & Centralized Indexing Authorization NCI is developing the Framework and will use it to stand up several example Data Commons the community can leverage or use as a model to build their own commons. https://dcf.gen3.org/
NCI Cloud Resources The Cloud Resources provide: • Access to large cancer data sets without need to download • Access to workspaces, analysis tools, and pipelines http://isb-cgc.org • Ability for researchers to bring their own data and tools http://cgc.sbgenomics.com http://firecloud.terra.bio/# 7
NCI Genomic Data Commons (GDC) • Launched in 2016 with over 4 PB of data. • Joint project with OICR. • Used by 1000 -2000+ users per day. • Based upon an open source software stack that can be used to build other data commons. *See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
PDC - https://pdc.esacinc.com/pdc/pdc
State of proteomics data ▪ Research has been focused on data production ▪ Standard formats ▪ Data shared through a small number of interconnected repositories ▪ Varied analyses ▪ Difficult to reproduce an analysis ▪ Stable research community ▪ Slowly permeating biomedical research space
Cultural challenges ▪ Data shared through downloading ▪ Analyses performed on servers at local institutions ▪ Popular workflows not yet available in cloud ▪ CLOUD Act (outside US) ▪ Disparity of felt cost
Carrots and sticks Carrots Potential Sticks • Supported analysis pipelines • Reduce NIH support for local computation • Reproducible analyses • Journals require data deposition • Cost effective data solution in cloud • Data security • Datasets not on cloud become • Easy collaboration irrelevant • Recognition from data science community
Acknowledgments NCI NCI NCI - Tony Kerlavage - Lyubov Remennik - Henry Rodriguez - Vivian Ota Wang - David Patton - Emily Boja - Juli Klemm - Nina Ghanem - Tara Hiltke - Tanja Davidsen - Matthew Byers - Mehdi Mesri - Craig Hayn - Melissa Cook - Ana Robles - Ian Fore - Mark Jensen - Anna Roberts-Pilgrim - Elizabeth Hsu - Eric Scott - Annette Marrero - Sherri De Coronado - Dale Lamb - Dawn Hayward - Sima Pandya - Melissa Cook - Todd Pihl - Keyvan Farahani PDC - Jaime Guidry Auvil - Sylvia Gale - Anand Basu - Freddie Pruitt - Johanna Goderre Jones - Ratna Thangudu - Zhining Wang - Cathy Rowe - Michael Holck - Eve Shalley - Smita Hastak - Michael MacCoss - John Otridge - Denise Warzel - Paul Rudnick - Allen Dearry - Anna Mencarelli - Kanakadurga Addepalli - Barbara Vann - Erika Kim - Resham Kulkarni
Recommend
More recommend