data in the cloud
play

DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer - PowerPoint PPT Presentation

CONSIDERATIONS FOR CANCER RESEARCH DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute Outline Introduction to Cancer Research Data Commons Considerations for shifting the proteomics research community to the cloud


  1. CONSIDERATIONS FOR CANCER RESEARCH DATA IN THE CLOUD Christopher R. Kinsinger, National Cancer Institute

  2. Outline ▪ Introduction to Cancer Research Data Commons ▪ Considerations for shifting the proteomics research community to the cloud

  3. Integrating across data types Clark, DJ et al. Cell , 2019 , 179 , 964-983

  4. The Beau Biden Cancer Moonshot sm The Beau Biden Cancer Moonshot sm Blue Ribbon Panel – Overarching goals • Accelerate progress in cancer, including October, 2016 prevention & screening Recommendations include: • From cutting edge basic research to • Build a National Cancer Data Ecosystem wider uptake of standard of care • Enhanced cloud-computing platforms • Encourage greater cooperation and • Services that link disparate collaboration information, including clinical, image, • Within and between academia, and molecular data government, and private sector • Essential underlying data science • Enhance data sharing infrastructure, standards, methods, and portals for the Cancer Data Ecosystem 4

  5. Components: Data Sources: • Data Nodes • Data Commons Framework Canine Immuno-oncology • Data Aggregators studies • Cloud Resources Clinical Proteomics Tumor Analysis Consortium* • APIs TCIA • Elastic compute The Cancer Imaging Archive* resources • Portals • Workspaces Data • Analytic Tools Scientists • Tool repositories

  6. Data Commons Framework powered by Gen3 (U of Chicago) Set of modular Reusable, expandable Core principles and components that can framework for a Data structures for a Data be leveraged across Commons Commons Data Commons IndexD Fence Centralized Authentication & Centralized Indexing Authorization NCI is developing the Framework and will use it to stand up several example Data Commons the community can leverage or use as a model to build their own commons. https://dcf.gen3.org/

  7. NCI Cloud Resources The Cloud Resources provide: • Access to large cancer data sets without need to download • Access to workspaces, analysis tools, and pipelines http://isb-cgc.org • Ability for researchers to bring their own data and tools http://cgc.sbgenomics.com http://firecloud.terra.bio/# 7

  8. NCI Genomic Data Commons (GDC) • Launched in 2016 with over 4 PB of data. • Joint project with OICR. • Used by 1000 -2000+ users per day. • Based upon an open source software stack that can be used to build other data commons. *See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.

  9. PDC - https://pdc.esacinc.com/pdc/pdc

  10. State of proteomics data ▪ Research has been focused on data production ▪ Standard formats ▪ Data shared through a small number of interconnected repositories ▪ Varied analyses ▪ Difficult to reproduce an analysis ▪ Stable research community ▪ Slowly permeating biomedical research space

  11. Cultural challenges ▪ Data shared through downloading ▪ Analyses performed on servers at local institutions ▪ Popular workflows not yet available in cloud ▪ CLOUD Act (outside US) ▪ Disparity of felt cost

  12. Carrots and sticks Carrots Potential Sticks • Supported analysis pipelines • Reduce NIH support for local computation • Reproducible analyses • Journals require data deposition • Cost effective data solution in cloud • Data security • Datasets not on cloud become • Easy collaboration irrelevant • Recognition from data science community

  13. Acknowledgments NCI NCI NCI - Tony Kerlavage - Lyubov Remennik - Henry Rodriguez - Vivian Ota Wang - David Patton - Emily Boja - Juli Klemm - Nina Ghanem - Tara Hiltke - Tanja Davidsen - Matthew Byers - Mehdi Mesri - Craig Hayn - Melissa Cook - Ana Robles - Ian Fore - Mark Jensen - Anna Roberts-Pilgrim - Elizabeth Hsu - Eric Scott - Annette Marrero - Sherri De Coronado - Dale Lamb - Dawn Hayward - Sima Pandya - Melissa Cook - Todd Pihl - Keyvan Farahani PDC - Jaime Guidry Auvil - Sylvia Gale - Anand Basu - Freddie Pruitt - Johanna Goderre Jones - Ratna Thangudu - Zhining Wang - Cathy Rowe - Michael Holck - Eve Shalley - Smita Hastak - Michael MacCoss - John Otridge - Denise Warzel - Paul Rudnick - Allen Dearry - Anna Mencarelli - Kanakadurga Addepalli - Barbara Vann - Erika Kim - Resham Kulkarni

Recommend


More recommend