secure multiparty computation
play

Secure Multiparty Computation Introduction to Privacy Preserving - PowerPoint PPT Presentation

CS573 Data Privacy and Security Secure Multiparty Computation Introduction to Privacy Preserving Distributed Data Mining Li Xiong Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT Dallas Outline Overview Data


  1. CS573 Data Privacy and Security Secure Multiparty Computation – Introduction to Privacy Preserving Distributed Data Mining Li Xiong Slides credit: Chris Clifton, Purdue University; Murat Kantarcioglu, UT Dallas

  2. Outline • Overview • Data partition – Horizontally partitioned – Vertically partitioned • Privacy preserving Distributed Data Mining • Approaches to preserve privacy • Privacy preserving data mining toolkit

  3. Overview • What is Data Mining? – Extracting implicit un-obvious patterns and relationships from a warehoused of data sets. • This information can be useful to increase the efficiency of the organization and aids future plans • Can be done at an organizational level – By Establishing a data Warehouse

  4. Motivation • Huge databases exist in various applications – Medical data – Consumer purchase data – Census data – Communication and media-related data – Data gathered by government agencies • Can these data be utilized? – For medical research – For improving customer service – For homeland security 4

  5. Motivation • Data sharing is necessary for full utilization • Pooling medical data can improve the quality of medical research • The huge amount of data available means that it is possible to learn a lot of information about individuals from public data – Purchasing patterns – Family history – Medical data – … 5

  6. Horizontally Partitioned Data • Data can be unioned to create the complete set key X1…Xd K1 k2 kn key X1…Xd key X1…Xd key X1…Xd K i+1 K m+1 K1 k i+2 k m+2 k2 kj kn ki Site 1 Site 2 … Site r

  7. Vertically Partitioned Data • Data can be joined to create the complete set key X1…Xi Xi+1… Xj … Xm+1… Xd key X1…Xi key Xi+1…Xj key Xm+1…Xd Site 1 Site 2 … Site r

  8. Distributed Data Mining • The setting: – Data is distributed at different sites – These sites may be third parties (e.g., hospitals, government x1 x2 bodies) or may be the individual him or herself f(x1,x2,…, xn) xn x3

  9. Distributed Data Mining • Government / public agencies. Example: – The Centers for Disease Control want to identify disease outbreaks – Insurance companies have data on disease incidents, seriousness, patient background, etc. – But can/should they release this information? • Industry Collaborations / Trade Groups. Example: – An industry trade group may want to identify best practices to help members – But some practices are trade secrets – How do we provide “commodity” results to all (Manufacturing using chemical supplies from supplier X have high failure rates), while still preserving secrets (manufacturing process Y gives low failure rates)?

  10. Privacy and Security Restrictions • Individual Privacy – Nobody should know more about any entity after the data mining than they did before • Organization Privacy – Protect knowledge about a collection of entities • Individual entity values may be known to all parties • Which entities are at which site may be secret

  11. Privacy-Preserving Distributed Data Mining: Why ? • Data needed for data mining maybe distributed among parties – Credit card fraud data • Inability to share data due to privacy reasons – HIPPAA • Even partial results may need to be kept private

  12. Approaches to preserve privacy • Restrict Access to data (Protect Individual records) • Protect both the data and its source: – Secure Multi-party computation (SMC) – Input Data Randomization • There is no such one solution that fits all purposes

  13. Secure computation and privacy • Secure computation – Assume that there is a function that all parties wish to compute – Secure computation shows how to compute that function in the safest way possible – In particular, it guarantees minimal information leakage (the output only) • Privacy – Does the function output itself reveal “sensitive information”, or – Should the parties agree to compute this function? 13

  14. Secure Multi-Party Computation (SMC) • The goal is computing a function 𝑔(𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 ) without revealing 𝑦 𝑗 • Semi-Honest Model – Parties follow the protocol • Malicious Model – Parties may or may not follow the protocol • We cannot do better then the existence of the third trusted party situation • Generic SMC is too inefficient for PPDDM

  15. Secure Multiparty Computation • Basic cryptographic tools – Oblivious transfer – Random shares – Oblivious circuit evaluation • Yao’s Millionaire’s problem (Yao ’86) – Secure computation possible if function can be represented as a circuit • Works for multiple parties as well (Goldreich, Micali, and Wigderson ’87)

  16. But we aren’t done yet • Circuit evaluation: Build a circuit that represents the computation – For all possible inputs – Impossibly large for typical data mining tasks • Next step: – Efficient techniques for specialized tasks and computations – Tradeoff between security, efficiency, and accuracy

  17. Secure computation tasks • Examples: – Authentication protocols – Online payments – Auctions – Elections – Privacy preserving data mining – Essentially any task… 17

  18. Application of SMC to Private Data Mining • Setting – Data is distributed at different sites – These sites may be third parties (e.g., hospitals, government bodies) or individuals • Aim – Compute the data mining algorithm on the data so that nothing but the output is learned – That is, carry out a secure computation 18

  19. Privacy preserving data mining toolkit (Clifton ‘02) • Many different data mining techniques often perform similar computations at various stages (e.g., computing sum, counting the number of items) • Toolkit – simple computations – sum, union, intersection … – assemble them to solve specific mining tasks – association rule mining, bayes classifier, … • The protocols may not be truly secure but more efficient than traditional SMC methods Tools for Privacy Preserving Data Mining, Clifton, 2002

  20. Primitive protocols • Secure functions – Secure sum – Secure union – …

  21. Secure Sum • Distributed data mining algorithms frequently calculate the sum of values from individual sites • Suppose we have s sites 1, … , 𝑡 • Site 𝑚 has an integer 𝑤 𝑚 • The sites want to know the value of 𝑡 𝑔 (𝑤 1 , . . , 𝑤 𝑡 ) = ෍ 𝑤 𝑚 𝑚=1 • Easy: – One site is designated the master site, numbered 1 – Site 𝑚 send 𝑤 𝑚 to party 1 ( 2 ≤ 𝑚 ≤ 𝑡 ) 𝑡 – Site 1 computes 𝑔 (𝑤 1 , . . , 𝑤 𝑡 ) = σ 𝑚=1 𝑤 𝑚 and broadcasts it

  22. Secure Sum • What they don’t like about this : – Site 1 now knows everyone’s values • Privacy constraint: – Site 𝑚 does not wish to reveal 𝑤 𝑚

  23. Secure Sum II • Suppose we have s sites 1, … , 𝑡 • Site 𝑚 has an integer 𝑤 𝑚 • The sites want to know the value of 𝑔 (𝑤 1 , . . , 𝑤 𝑡 ) = 𝑤 1 + … + 𝑤 𝑡 𝑡 • Assume that the value 𝑤 = σ 𝑚=1 𝑤 𝑚 to be computed is known to lie in the range [0. . 𝑜]

  24. Secure Sum II • Site 1: – generates a random number 𝑆 , uniformly chosen from [0..n] – adds R to its local value 𝑤 1 , and sends R + 𝑤 1 𝑛𝑝𝑒 𝑜 to site 2 • 𝐺𝑝𝑠 𝑚 = 2 . . 𝑡 − 1 𝑚−1 𝑤 𝑘 𝑛𝑝𝑒 𝑜 – Site 𝑚 receives 𝑊 = 𝑆 + σ 𝑘=1 – Site 𝑚 then computes 𝑚 • 𝑊 = 𝑆 + σ 𝑘=1 𝑤 𝑘 𝑛𝑝𝑒 𝑜 = 𝑤 𝑚 + 𝑊 𝑛𝑝𝑒 𝑜 – Pass it to site 𝑚 + 1 • Site 𝑡 performs the above step, and sends the result to site 1 • Site 1, knowing 𝑆 , can subtract 𝑆 to get the actual result: (𝑊 − 𝑆) 𝑛𝑝𝑒 𝑜

  25. Secure Sum II

  26. Secure Sum - security • Does not reveal the real number • Is it secure?  Site can collude!  Each site can divide the number into shares, and run the algorithm multiple times with permutated nodes

  27. Secure Union • Useful in DM where each party needs to give rules, frequent itemsets, etc., without revealing the owner • Can be evaluated using SMC methods if the domain of the items is small • Each party creates a binary vector where 1 in the 𝑗 𝑢ℎ entry represents that the party has the 𝑗 𝑢ℎ item • After this point, a simple circuit that 𝑝𝑠’𝑡 the corresponding vectors can be built and it can be securely evaluated using general SM circuit evaluation protocols • However, in data mining the domain of the items is usually large

  28. Secure Union • Consider k parties 𝑄 1 , …, 𝑄 𝑙 having local sets 𝑇 1 , … , 𝑇 𝑙 , we wish to securely compute • 𝑉 = 𝑇 1 ∪ 𝑇 2 ∪ ⋯ ∪ 𝑇 𝑙 • Such that each party only knows 𝑉 and nothing else • Key: Commutative Encryption 𝐹 𝑏 ( 𝐹 𝑐 (x))= 𝐹 𝑐 ( 𝐹 𝑏 (x)) – (𝑒𝑓𝑑𝑠𝑧𝑞𝑢𝑗𝑝𝑜 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜 ℎ𝑏𝑡 𝑢ℎ𝑓 𝑡𝑏𝑛𝑓 𝑞𝑠𝑝𝑞𝑓𝑠𝑢𝑧) • Multiple encryption and decryption operations can be performed over a value without any restriction about the order of these operations

Recommend


More recommend