Privacy Preserving Distributed ID3 Algorithm Nan Meng University of Hong Kong u3003637@connect.hku.hk April 29, 2016 Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 1 / 29
Overview ∗ Introduction ∗ Problem Definition ∗ Solution ∗ Result ∗ Conclusion Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 2 / 29
Privacy Preserving Data Mining • Mining while protecting the privacy of data. Figure: Lindell’s definition Figure: Agrawal’s definition Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 3 / 29
ID3 Algorithm • ID3 is an algorithm used to generate a decision tree from a dataset, and is typically used in the data mining. 1 . Calculate the entropy of every attribute using the data set S. 2 . Split the set S into subsets using the attribute for which entropy is minimum 3 . Make a decision tree node containing that attribute 4 . Recurse on subsets using remaining attributes. Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 4 / 29
Distributed ID3 Algorithm Table: Play Golf Dataset Outlook Temp Humidity Windy Play Golf Rainy Hot High FALSE No Rainy Hot High TRUE No Overcast Hot High FALSE Yes Alice ⇒ Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes Overcast Cool Normal TRUE Yes Rainy Mild High FALSE No Bob ⇒ Rainy Cool Normal FALSE Yes Sunny Mild Normal FALSE Yes Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 5 / 29
Distributed ID3 Algorithm • Data is distributed in two or more parties Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes • Combine data together and get a decision tree Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 6 / 29
Problem Definition • However, data is privacy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes • How to share data in a safe way in distributed ID3 algorithm? Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 7 / 29
An Example of Distributed ID3 Algorithm • Here we use a example of Distributed ID3 algorithm to clearly define the problem. For example, Compute the entropy of Rainy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes 2 records, 1 No, 1 Yes 3 records, 2 No, 1 Yes Entropy ( Rainy ) = − 2+1 3+2 log 2 ( 2+1 − 1+1 3+2 log 2 ( 1+1 3+2 ) 3+2 ) � �� � � �� � P layGolf = No P layGolf = Y es = − 3 5 log 2 ( 3 5 ) − 2 5 log 2 ( 2 5 ) Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 8 / 29
An Example of Distributed ID3 Algorithm • For example, Compute the entropy of Rainy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes 2 records, 1 No, 1 Yes 3 records, 2 No, 1 Yes Entropy ( Rainy ) = − 2+1 3+2 log 2 ( 2+1 1+1 3+2 log 2 ( 1+1 3+2 ) − 3+2 ) = − 3 5 log 2 ( 3 5 ) − 2 5 log 2 ( 2 5 ) Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 9 / 29
An Example of Distributed ID3 Algorithm • For example, Compute the entropy of Rainy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes 2 records, 1 No, 1 Yes 3 records, 2 No, 1 Yes − 2+1 3+2 log 2 ( 2+1 3+2 ) Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 10 / 29
An Example of Distributed ID3 Algorithm • For example, Compute the entropy of Rainy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes 2 records, 1 No, 1 Yes 3 records, 2 No, 1 Yes − 2+1 3+2 log 2 ( 2+1 3+2 ) Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 11 / 29
An Example of Distributed ID3 Algorithm • For example, Compute the entropy of Rainy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes 2 records, 1 No, 1 Yes 3 records, 2 No, 1 Yes 2+1 3+2 Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 12 / 29
An Example of Distributed ID3 Algorithm • For example, Compute the entropy of Rainy . Table: Bob Table: Alice Outlook Temp Humidity Windy Play Golf Outlook Temp Humidity Windy Play Golf Overcast Cool Normal TRUE Yes Rainy Hot High FALSE No Rainy Mild High FALSE No Rainy Hot High TRUE No Rainy Cool Normal FALSE Yes Overcast Hot High FALSE Yes Sunny Mild Normal FALSE Yes Sunny Mild High FALSE Yes Rainy Mild Normal TRUE Yes b records, y No, 1 Yes a records, x No, 1 Yes x + y a + b Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 13 / 29
Problem Definition • Compute x + y a + b without reveal a, x, b, y. • Realize Privacy Preserving Distributed ID3 algorithm. Alice Bob b record, y No a record, x No Enc(a) Enc(x) Enc(b) Enc(y) x+y a+b Enc ( · ) – Encryption Algorithm Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 14 / 29
Solution PPWAP • PPWAP: Privacy Preserving Weight Average Protocol • In this project, we choose PPWAP by Pailier Encryption. Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 15 / 29
Pailier Encryption • KeyGeneration () : Generate public key PK , and secret key SK . • Encryption ( m, PK ) : Using PK to encrypt message m , output Enc ( m ) . • Decryption ( Enc ( m ) , SK ) : Using SK to decrypt Enc ( m ) , output m . Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 16 / 29
Pailier Encryption • Property: Addition Homomorphism • Given two messages m 1 and m 2 , Enc ( m 1 + m 2) = Enc ( m 1) · Enc ( m 2) . • The encryption of m 1 + m 2 can be computerd by Enc ( m 1) and Enc ( m 2) . Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 17 / 29
PPWAP based on Pailier Encryption Privacy Preserving Weighted Average Protocol • Within the help of Paillier, build PPWAP scheme. Alice Bob Enc(a) Enc(x) Enc(b) Enc(y) x+y a+b Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 18 / 29
PPWAP Alice Bob 1 . KeyGeneration () : SK, PK Enc ( a ) ⇒ Encryption ( a, PK ) : Enc ( a ) Enc ( x ) Encryption ( x, PK ) : Enc ( x ) Random integer z 2 . Enc ( a ) z , Enc ( x ) z Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 19 / 29
PPWAP Alice Bob 1 . KeyGeneration () : SK, PK Enc ( a ) ⇒ Encryption ( a, PK ) : Enc ( a ) Enc ( x ) Encryption ( x, PK ) : Enc ( x ) Random integer z 2 . Enc ( a ) z , Enc ( x ) z Enc(a) z = Enc(a)...Enc(a) = Enc(a+a+...+a) = Enc(za) Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 20 / 29
PPWAP Alice Bob 1 . KeyGeneration () : SK, PK ⇒ Enc ( a ) Encryption ( a, PK ) : Enc ( a ) Enc ( x ) Encryption ( x, PK ) : Enc ( x ) Random integer z 2 . Enc ( a ) z , Enc ( x ) z Enc ( za ) , Enc ( zx ) Nan Meng (Imaging Systems Laboratory) Two-party Jointly Decision Tree April 29, 2016 21 / 29
Recommend
More recommend