problem genome data held in silos unshared not
play

Problem: Genome Data Held in Silos, Unshared, not Standardized for - PowerPoint PPT Presentation

Problem: Genome Data Held in Silos, Unshared, not Standardized for Exchange No one institute has enough on its own to make progress. Every researcher and clinician should be able to compare their genome data to others. We need a public ledger


  1. Problem: Genome Data Held in Silos, Unshared, not Standardized for Exchange No one institute has enough on its own to make progress. Every researcher and clinician should be able to compare their genome data to others.

  2. We need a public ledger for sharing GATTTATCTGCTCTCGTTG GAAGTACAAAATTCATTAAT GCTATGCACAAAATCTGTAG TAGTGTCCCATCTATTT C

  3. Alliance for data sharing UC Santa Cruz Genomics Institute

  4. Enabling Responsible Sharing of Genomic and Clinical Data • GA4GH Founded on June 5, 2013 – More than 400 insitu:onal members – From more than 40 countries, – Approx. 1/3 are companies • Mission: to enable rapid progress in biomedicine • Strategy: – support major driver projects – create and maintain interoperability of technology plaJorm standards – develop guidelines and harmonizing procedures for privacy and ethics in the interna:onal regulatory context – engage stakeholders across sectors to encourage the responsible and voluntary sharing of data and of methods

  5. Cancer Global Data Sharing Cancer is driven by muta8ons in DNA. Precision treatment of cancer depends on knowledge of these muta8ons Start with a Pilot: -build a mechanism for recording cancer DNA muta:ons and clinical informa:on from millions of cancer pa:ent par:cipants across the world -Ini:ally called the Ac:onable Cancer Genome Ini:a:ve -Cancer is the right place to start. Once this is working, similar technology could be used to share DNA informa:on for other diseases genomicsandhealth.org

  6. Example proposed public record gene: BRAF variant: V600E Pa:ent ID: 163a0083-26fa-4705-bc\-d264c4cff796 Gender: Male Ethnicity: White Caucasian Age at Diagnosis: 57 Tumor Classifica:on: non-small-cell lung carcinoma (MeSH D002289) Tissue or organ of origin: Lung Tumor morphology: Squamous (epidermoid)

  7. Data Sharing Specifica:ons • Data open and available to all • Ubiquitously accessible on the Internet • Can scale to accept dona:ons from 1000s of sources • Not maintained by any central authority or :ed to any single country, loca:on or ins:tu:on • Not corrup:ble • Protects par:cipant privacy • Stable design so that it may be used by many 3 rd party applica:on programs (“apps”)

  8. This is accomplished with a Shared Public Ledger Ethereum: https://www.ethereum.org/ foundation Ripple: https://ripple.com/ Hyperledger: https://github.com/hyperledger/ hyperledger IBM Open BlockChain: www. ibm .com/ blockchain / MIT Enigma project enigma.media.mit.edu AirBnB (proposed): www.coindesk.com/ airbnb -exec- use- blockchain /

  9. Simplest Shared Public Ledger • Record of transac8ons over :me • Transac:on is adding informa:on to the database – Special case: New informa:on marks previous informa:on as out-of-date • It is only possible to add more transac:ons, no transac:on is ever erased or altered • 1000s of copies of the ledger all over the world are kept in sync while addi:onal transac:ons come in from mul:ple sources by miners • A shared public ledger keeps track of data provenance, i.e. when and how data was entered and updated, so users have the reputa:on/ reliability informa:on they need to filter out data they don’t want

  10. Ethereum Cancer KnowLedger Pilot Website at findpubs.org See: https://github.com/maximilianh/acgi

  11. Who will use the shared public ledger? Data Users are: • Professional Researchers • Ci:zen Scien:sts ? • Clinicians • Developers of molecular ? dx, drugs, decision analysis Shared Public Ledger tools ? • Payers • Pa:ents/par:cipants ? • Regulatory agencies and treatment guideline organiza:ons

  12. Where do the data come from? ? ? • Ul:mately, all data comes from ? individual par8cipants who wish to ? share their gene:c and clinical ? informa:on for research or ? improvement of medicine ? ? ?

  13. How do data enter the ledger? Researchers Clinicians ? Shared par:cipants Public Ledger Individuals Developers

  14. A par:cipant works with a Trusted Steward to add their informa:on to the ledger

  15. Possible Trusted Stewards • Medical research ins:tu:ons (e.g. GENIE ins:tu:ons) • Hospitals and clinics • Pa:ent registry services and clinical trial recruitment organiza:ons • Pa:ent agency advocate groups – possibly providing service to allow pa:ents to maintain agency over data AND par:cipate in clinical trials (e.g. Sage Trust and Gene:c Alliance) • Gene:c tes:ng companies All trusted stewards use the same somware (provided by GA4GH) to add informa:on to the ledger System is designed to support thousands of stewards globally

  16. Trusted Stewards Data Users Medical Researchers Clinics Clinicians Pa:ent Advocacy Groups Shared par:cipants Public Ledger Pa:ent Registries Developers Tes:ng Companies Individuals

  17. What goes into the public ledger and what stays with the steward? Steward Public Ledger • Par:cipant personal • Par:cipant’s gene:c iden:fying informa:on variants in selected and staged consent genes • Par:cipant’s extended • ~1 dozen broad, non- clinical and gene:c info iden:fying clinical – Possibly iden:fying features • Par:cipant’s instruc:ons • Steward’s iden:ty and for sharing addl info with contact info qualified researchers w/o • Random numerical ID recontact for par:cipant • Par:cipant’s instruc:ons for recontact

  18. Example: par:cipant -> public ledger • Par:cipant visits doctor at a medical clinic • Doctor orders gene:c test • Doctor suggests data dona:on through steward (possibly her own ins:tu:on) • Test results come back

  19. Test Results

  20. • Par:cipant visits steward • Steward records from the par:cipant: – personal informa:on – test results – addi:onal gene:c and clinical data – consent to donate to database – addi:onal sharing and recontact preferences • Steward appends par:cipant’s publicly sharable gene:c/clinical data to public ledger

  21. Trusted Stewards Data Users Medical Researchers Clinics Clinicians Pa:ent Advocacy Groups Shared par:cipants Public Ledger Pa:ent Registries Developers Tes:ng Companies Individuals

  22. Recontact • Data user discovers muta:ons of interest by using 3 rd party app on the public ledger and wants more informa:on about the par:cipant that provided in the ledger • Data user contacts par:cipant’s steward • Steward has info about under what circumstances the par:cipant will share addi:onal data or agrees to be recontacted • As appropriate, steward will – supply addi:onal informa:on to data user or – set up contact between user and par:cipant

  23. Trusted Stewards Data Users Medical Researchers Clinics Third-Party App Clinicians Pa:ent Advocacy Groups Shared par:cipants Public Ledger Pa:ent Registries Developers Tes:ng Companies Individuals

  24. How do two stewards know if they have a par:cipant in common? • On the public ledger, a par:cipant is iden:fied with a random number • Each steward securely stores personal iden:fiable informa:on for their par:cipants; only they can associate par:cipant with a public random number • Personal iden:fiable informa:on is compared between the two stewards without revealing any informa:on except which par:cipants they have in common by a cryptographic trick (secure mul:party computa:on) To make it comparable, personal information could be collected by the NIH NDAR GUID standard

  25. Demo: secure mul:party computa:on to compute private set intersec:on Internet Homomorphic encryption Steward B Steward A 100,000 participants 100,000 participants 10 overlap with A 10 overlap with B Neither server sees the personal information on the other. Only the checksum identifying the participants in common is visible, nothing else Runtime: 10 seconds over a transatlantic link, single CPU Implementation and experiments by Max Haeussler

  26. Who can be a steward? • Any en:ty that: – Has a legi:mate permanent contact – Follows the rules – Has enough par:cipants to prevent “par:cipant reiden:fica:on by steward” (small stewards can be anonymously pooled) • All stewards will have Internet ra:ngs; these can be available on a ledger (e.g. like AirBnB); users can filter out unreliable steward data

  27. Who pays for all this? • System can be designed and implemented for a few million dollars; long term maintenance is the only issue • Governments or philanthropies (possibly associated with hospitals, pa:ent advocacy groups, etc.) could supply general funding • “Taxes” on gene:c tests could provide revenue • Stewards can be mo:vated to secure data dona:ons either by altruism or commissions from data users • 3 rd party app developers can charge for use of their tools or sell adver:sing to support their efforts and to support the public ledger

  28. Summary • A completely decentralized, public database is possible, while s:ll protec:ng privacy • We need this because “trust is local” • No single state government or private organiza:on can/should own or control all the world’s gene:c data • Once launched, a shared public ledger grows and is maintained organically by the global community because it benefits them, much like the Internet itself

Recommend


More recommend