genomic analysis
play

Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015 - PowerPoint PPT Presentation

Homomorphic Encryption for Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015 Homomorphic Encryption Homomorphic encryption (HE): encryption schemes that support computation on ciphertexts Consists of three functions: m c


  1. Homomorphic Encryption for Genomic Analysis Hoon Cho (MIT) and David Wu (Stanford) March, 2015

  2. Homomorphic Encryption Homomorphic encryption (HE): encryption schemes that support computation on ciphertexts Consists of three functions: m c c m Enc Dec pk sk Must satisfy usual notion of semantic security

  3. Homomorphic Encryption Homomorphic encryption: encryption schemes that support computation on ciphertexts Consists of three functions: 𝑑 1 = Enc π‘žπ‘™ (𝑛 1 ) 𝑑 3 Eval 𝑔 𝑑 2 = Enc π‘žπ‘™ (𝑛 2 ) 𝑓𝑙 Dec 𝑑𝑙 Evaπ‘š 𝑔 𝑓𝑙, 𝑑 1 , 𝑑 2 = 𝑔 𝑛 1 , 𝑛 2

  4. Fully Homomorphic Encryption (FHE) Many homomorphic encryption schemes: β€’ ElGamal: 𝑔 𝑛 0 , 𝑛 1 = 𝑛 0 𝑛 1 β€’ Paillier: 𝑔 𝑛 0 , 𝑛 1 = 𝑛 0 + 𝑛 1 Fully homomorphic encryption: homomorphic with respect to two operations: addition and multiplication β€’ [BGN05]: one multiplication, many additions (SWHE) β€’ [Gen09]: first FHE construction from lattices

  5. Task 1: Computing GWAS Genotypes for different AA AG AA AG GG Case: individuals at a fixed location AG AG GA GG GG Control: in the genome allele counts Minor Allele Frequency: min π‘œ 𝐡 ,π‘œ 𝐻 π‘œ 𝐡 +π‘œ 𝐻 Observed (Obs) and expected (Exp) are πœ“ 2 -statistic: πœ“ 2 = βˆ‘ Obsβˆ’Exp 2 functions of the different allele counts in Exp the case and control groups

  6. Limitations of FHE In theory: SWHE/FHE can evaluate arbitrary functions But many limitations in practice: β€’ Computation must be expressed as an arithmetic circuit: thus, division is hard β€’ Performance degrades rapidly in multiplicative depth of circuit

  7. Striking a Balance Observation : allele min π‘œ 𝐡 ,π‘œ 𝐻 Minor Allele Frequency: π‘œ 𝐡 +π‘œ 𝐻 counts are sufficient for computing MAF and πœ“ 2 Obsβˆ’Exp 2 πœ“ 2 -statistic: πœ“ 2 = βˆ‘ Exp Solution : delegate aggregation to the cloud, client computes the statistical quantities of interest

  8. Practical Outsourcing Solution : delegate aggregation to the cloud, client computes the statistical quantities of interest Solution enables use of symmetric primitives (e.g., AES) Symmetric primitives + arithmetic faster than public key decryption

  9. Symmetric Encryption π‘œ 𝐡 π‘œ 𝐷 π‘œ 𝐻 π‘œ π‘ˆ each genotype encode 2 0 0 0 AA represented as a vector of counts blind 2 + 𝑠 0 + 𝑠 0 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 encrypt entries by adding independent, blinding factors from β„€ π‘œ

  10. Symmetric Encryption AA 2 + 𝑠 0 + 𝑠 0 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 AG β€² β€² β€² β€² 1 + 𝑠 0 + 𝑠 1 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 β€² β€² β€² β€² Sum 3 + 𝑠 𝐡 + 𝑠 0 + 𝑠 𝑑 + 𝑠 1 + 𝑠 𝐻 + 𝑠 0 + 𝑠 π‘ˆ + 𝑠 π‘ˆ 𝐡 𝐷 𝐻 decryption: compute blinding factors and subtract

  11. Symmetric Encryption generate blinding factors using PRF(𝑙, tag) tag: SNP id ǁ group id ǁ subject id AA 2 + 𝑠 0 + 𝑠 0 + 𝑠 0 + 𝑠 π‘ˆ 𝐡 𝐷 𝐻

  12. Symmetric Encryption Homomorphic operations consist of only additions Encryption and decryption are symmetric primitives

  13. Further Improvements Client must do linear work to decrypt β€’ Alternative: if the data comes in batches, the client can precompute the counts per batch during encryption β€’ Decryption time proportional to number of batches

  14. Performance Timing (in seconds) for computing MAF + πœ“ 2 statistics (500 subjects) # SNPs Encryption Aggregation Decryption 100 0.17 0.02 0.15 1,000 1.68 0.17 1.42 10,000 17.47 1.59 15.06 100,000 179.53 17.72 145.52 Only a few hundred lines to implement!

  15. Task 2: Hamming Distance Computation location of edit edit chr1:101088593: (C οƒ  T) chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) and so on… and so on… compute the Hamming distance between two sequences (represented as edits with respect to a reference genome)

  16. Task 2: Hamming Distance Computation chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) ATGCTTA GTGGC… chr1:10165300: (T οƒ  G) and so on… chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) ACGCTTG GTGGC… chr1:10165300: (T οƒ  C) and so on… naΓ―ve method: expand sequences, pairwise equality test

  17. Task 2: Hamming Distance Computation chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) ATGCTTAGTGGC… chr1:10165300: (T οƒ  G) and so on… sequences too long: over 3 billion base pairs in human genome desire: protocol with performance proportional to number of edits

  18. Task 2: Hamming Distance Computation chr1:101088593: (C οƒ  T) chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) and so on… and so on… Genome A Genome B view genomes as sets of edits from reference: 𝑒 𝐼 𝐡, 𝐢 = 𝐡 + 𝐢 βˆ’ 2 β‹… 𝐡 ∩ 𝐢

  19. Task 2: Hamming Distance Computation Problem reduces to set intersection: 𝑒 𝐼 𝐡, 𝐢 = 𝐡 + 𝐢 βˆ’ 2 β‹… 𝐡 ∩ 𝐢 Slight caveat: same location, different chr1:10165300: (T οƒ  G) edit: contribution to Hamming distance chr1:10165300: (T οƒ  C) should be 1

  20. Task 2: Hamming Distance Computation Formulate as two set intersection problems: 𝑒 𝐼 𝐡, 𝐢 = 𝐡 + 𝐢 βˆ’ 𝐡 ∩ 𝐢 βˆ’ 𝐡 loc ∩ 𝐢 loc locations location, only edit pairs

  21. Homomorphic Set Intersection chr1:101088593: (C οƒ  T) chr1:100011666: (T οƒ  C) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) and so on… and so on… Equality function: 𝑔 𝑦, 𝑧 = 𝟐 𝑦 = 𝑧 Simple solution: sum over pairwise equality tests

  22. Homomorphic Set Intersection Homomorphic evaluation of equality function: If 𝑦, 𝑧 ∈ 0,1 , 𝑔 𝑦, 𝑧 = 𝟐 𝑦 = 𝑧 = 1 βˆ’ 𝑦 βˆ’ 𝑧 2 Easy to generalize to π‘œ bit integers, but requires degree 2π‘œ homomorphism

  23. Homomorphic Set Intersection Hashing to decrease number of pairwise comparisons hashing chr1:100011666: (T οƒ  C) chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:101265309: (C οƒ  T) equality chr1:10165300: (T οƒ  G) chr1:10165300: (T οƒ  C) test and so on… and so on… hash elements into buckets, pairwise equality test on hashed values within buckets

  24. Homomorphic Set Intersection: Tradeoffs More buckets οƒ  lower collision rate, possibly more ciphertexts chr1:101088593: (C οƒ  T) chr1:101265309: (C οƒ  T) chr1:10165300: (T οƒ  G) and so on… More bits οƒ  lower collision rate, more homomorphism for equality test Tunable parameters: β€’ number of buckets Larger buckets οƒ  less likely that β€’ bits used to represent each bucket overflows element in a bucket β€’ bucket size

  25. Performance Timing (in seconds) for homomorphic set intersection using HELib: Key Size of Sets Hashing Encryption Computation Encryption Generation 1,000 23.80 0.007 31.97 104.16 1.78 5,000 23.36 0.025 95.38 475.37 1.78 10,000 27.14 0.093 176.50 936.64 1.91 Primary drawback: key sizes + ciphertext sizes very large (several hundred MB to just over 1 GB)

  26. Conclusions Task 1: Most efficient solution is to compute counts – symmetric primitives suffice Task 2: Hashing-based homomorphic set intersection can handle edit-sets with up to ten thousand elements, but with large parameter sizes

Recommend


More recommend