Se SecE cEQP: A Se Secu cure e an and Ef Effici cient ent Sc Schem eme e for Sk SkNN Quer ery Pr Problem lem over er En Encr crypte ted d Geo eoda data ta on Cl Cloud Alex ex X. Liu Professor, fessor, IEEE Fell llow ow Dept. t. of Comput puter er Science ence & Enginee ineering ring Michig chigan an State e University iversity Ea East t Lansing, sing, Michi chiga gan Co-authors: Xinyu Lei, Rui Li, and Guan-Hua Tu
Alex X. Liu 2/48
Privacy Matters Face cebo book ok – Cam Cambridge bridge Analytica alytica data ta scandal ndal in 2018 18 “ Outsourced data storage on remote clouds is practical and relatively safe if only the data owner, not the cloud service, holds the decryption keys .” ─ The General Data Protection Regulation (GDPR) is a regulation in EU law on data protection and privacy for all individuals within the European Union (EU) and the European Economic Area (EEA). ─ In effective since May 2018. Alex X. Liu 3/48
Location Based Services vs. Location Privacy Location Based Services Location Privacy Alex X. Liu 4/48
Why Cloud Cannot Be Fully Trusted Cloud may give your personal data to government or another company Corrupted cloud employee may peak at your data Cloud may be hacked Alex X. Liu 5/48
System and Threat Model Data Query Results Data Owner Public Cloud Data User Data User Threat Model: semi-honest (i.e., honest-but-curious) Alex X. Liu 6/48
Problem Statement Problem: Searchable Symmetric Encryption for kNN Geolocation Queries Data: 2-D geospatial data Query: kNN query for a given location Requirement ─ Security: provable ─ Practicality: usable ● Efficiency: ms for querying millions of data points ● Scalability: sub-linear Alex X. Liu 7/48
Search over Encrypted Data Encrypted data themselves are not searchable. Enc(Data,k) Data MetaData (Data,k) searchable Query MetaQuery(Query,k) MetaData is called Secure Index MetaQuery is called Trapdoor 8/48
kNN Query Processing in 1-D Data Distance between 1-D data points p and q = |p-q|. ─ If p and q are plain text: trivial. ─ If p and q are encrypted: requires homomorphic encryption (extremely slow) How to avoid computation over encrypted data? ─ Idea 1: segmentation with controllable granularity 𝑞 ● For each data point p, convert p to ⌊ ⌋ , where g is the granularity. ─ Idea 2: checking whether two numbers are equal is easy to do in a privacy-preserving fashion using secure hash functions ● Given data p 1 , …, p n , compute HMAC( ⌊ 𝑞 1 ⌋ ,k), …, HMAC( ⌊ 𝑞 𝑜 ⌋ ,k). 𝑟 ● Given query q, computer HMAC( ⌊ ⌋ ,k). Alex X. Liu 9/48
kNN Query Processing in 2-D Data Distance between two 2-D points (x 1 , y 1 ) and (x 2 , y 2 ) Granularity is in terms of circles, not segments. How to check whether two points are in the same circle? ─ Idea 3: Multi-vector Based Segmented Projection Alex X. Liu 10/48
Multi-vector Based Segmented Projection 1-Vector based segmented projection: ─ Given data p, segment length g, and a unit vector റ 𝑏 , (i.e.,| റ 𝑏 |=1), 𝑞.𝑏 റ ℎ 𝑏, 𝑞 = ⌊ ⌋ ─ Equivalence: 𝑞 1 ≡ 𝑞 2 iff ℎ 𝑏, 𝑞 1 = ℎ 𝑏, 𝑞 2 . ─ Geometrically, equivalent class is a bar. Alex X. Liu 11/48
Multi-vector Based Segmented Projection 2-Vector based segmented projection: ─ Degree between two vectors: 360/(2*2)=90 ─ Equivalence: 𝑞 1 ≡ 𝑞 2 iff ℎ 𝑏 1 , (𝑞 1 ) = ℎ 𝑏 1 , (𝑞 2 ) and ℎ 𝑏 2 , (𝑞 1 ) = ℎ 𝑏 2 , (𝑞 2 ) ─ Geometrically, equivalent class is a square. Alex X. Liu 12/48
Multi-vector Based Segmented Projection 3-Vector based segmented projection: ─ Degree between two vectors: 360/(2*3)=60 ─ Equivalence: 𝑞 1 ≡ 𝑞 2 iff ● ℎ 𝑏 1 , (𝑞 1 ) = ℎ 𝑏 1 , (𝑞 2 ) and ● ℎ 𝑏 2 , (𝑞 1 ) = ℎ 𝑏 2 , (𝑞 2 ) and ● ℎ 𝑏 3 , (𝑞 1 ) = ℎ 𝑏 3 , (𝑞 2 ) ─ Geometrically, equivalent class is a regular hexagon. Alex X. Liu 13/48
Multi-vector Based Segmented Projection d-Vector based segmented projection: ─ Degree between two vectors: 360/(2*d) ─ Equivalence: 𝑞 1 ≡ 𝑞 2 iff ● ℎ 𝑏 1 , (𝑞 1 ) = ℎ 𝑏 1 , (𝑞 2 ) and ● ℎ 𝑏 2 , (𝑞 1 ) = ℎ 𝑏 2 , (𝑞 2 ) and ● ……. ● ℎ 𝑏 𝑒 , (𝑞 1 ) = ℎ 𝑏 𝑒 , (𝑞 2 ) ─ Geometrically, equivalent class is a regular polygon with 2d edges. ─ The larger d is, the more closer the equivalent class is a circle. d=4 d=5 d=6 d=7 Alex X. Liu 14/48
Data Processing with d Vectors and m Granularities For each data point 𝑞 𝑗 : ─ for granularity g 1 , compute: ℎ 𝑏 1 , 1 𝑞 𝑗 , ℎ 𝑏 2 , 1 𝑞 𝑗 ,…, ℎ 𝑏 𝑒 , 1 𝑞 𝑗 ─ for granularity g 2 , compute: ℎ 𝑏 1 , 2 𝑞 𝑗 , ℎ 𝑏 2 , 2 𝑞 𝑗 ,…, ℎ 𝑏 𝑒 , 2 𝑞 𝑗 ─ …… ─ for granularity g m , compute: ℎ 𝑏 1 ,𝑛 𝑞 𝑗 , ℎ 𝑏 2 ,𝑛 𝑞 𝑗 ,…, ℎ 𝑏 𝑒 ,𝑛 𝑞 𝑗 Alex X. Liu 15/48
Basic Linear KNN Query Processing Algorithm Linear Algorithm for finding k-nearest neighbors for query q: ─ result = ∅ ; ─ for j=1 to m ● for each data point 𝑞 𝑗 , if ( ℎ 𝑏 1 , 𝑘 𝑟 = ℎ 𝑏 1 , 𝑘 𝑞 𝑗 ) ∧ ( ℎ 𝑏 2 , 𝑘 𝑟 = ℎ 𝑏 2 , 𝑘 𝑞 𝑗 ) ∧ … ∧ ( ℎ 𝑏 𝑒 , 𝑘 𝑟 = ℎ 𝑏 𝑒 , 𝑘 𝑞 𝑗 ) then add 𝑞 𝑗 to result. ● if |result|≥k, then exit. Alex X. Liu 16/48
Convert Equality Comparison to Membership Query Convert d equality comparisons to one equality comparison: ( ℎ 𝑏 1 , 𝑘 𝑟 = ℎ 𝑏 1 , 𝑘 𝑞 𝑗 ) ∧ ( ℎ 𝑏 2 , 𝑘 𝑟 = ℎ 𝑏 2 , 𝑘 𝑞 𝑗 ) ∧ … ∧ ( ℎ 𝑏 𝑒 , 𝑘 𝑟 = ℎ 𝑏 𝑒 , 𝑘 𝑞 𝑗 ) ℎ 𝑏 1 , 𝑘 𝑟 | ℎ 𝑏 2 , 𝑘 𝑟 |…| ℎ 𝑏 𝑒 , 𝑘 𝑟 = ℎ 𝑏 1 , 𝑘 𝑞 𝑗 | ℎ 𝑏 2 , 𝑘 𝑞 𝑗 |…| ℎ 𝑏 𝑒 , 𝑘 𝑞 𝑗 HMAC( ℎ 𝑏 1 , 𝑘 𝑟 | ℎ 𝑏 2 , 𝑘 𝑟 |…| ℎ 𝑏 𝑒 , 𝑘 𝑟 , 𝐿) = HMAC( ℎ 𝑏 1 , 𝑘 𝑞 𝑗 | ℎ 𝑏 2 , 𝑘 𝑞 𝑗 |…| ℎ 𝑏 𝑒 , 𝑘 𝑞 𝑗 , , 𝐿) Further convert one comparison to membership queries HMAC( ℎ 𝑏 1 , 𝑘 𝑟 | ℎ 𝑏 2 , 𝑘 𝑟 |…| ℎ 𝑏 𝑒 , 𝑘 𝑟 , 𝐿) = HMAC( ℎ 𝑏 1 , 𝑘 𝑞 𝑗 | ℎ 𝑏 2 , 𝑘 𝑞 𝑗 |…| ℎ 𝑏 𝑒 , 𝑘 𝑞 𝑗 , , 𝐿) Is HMAC( 𝑘 | ℎ 𝑏 1 , 𝑘 𝑟 | ℎ 𝑏 2 , 𝑘 𝑟 |…| ℎ 𝑏 𝑒 , 𝑘 𝑟 , 𝐿) in the set {HMAC( 1|ℎ 𝑏 1 , 1 𝑞 𝑗 | ℎ 𝑏 2 , 1 𝑞 𝑗 |…| ℎ 𝑏 𝑒 , 1 𝑞 𝑗 , , 𝐿) , HMAC( 2|ℎ 𝑏 1 , 2 𝑞 𝑗 | ℎ 𝑏 2 , 2 𝑞 𝑗 |…| ℎ 𝑏 𝑒 , 2 𝑞 𝑗 , , 𝐿), …… HMAC( 𝑛|ℎ 𝑏 1 , 𝑛 𝑞 𝑗 | ℎ 𝑏 1 , 𝑛 𝑞 𝑗 |…| ℎ 𝑏 𝑒 , 𝑛 𝑞 𝑗 , , 𝐿) } For each data point, use an Indistinguishable Bloom Filter (IBF) to store its m HMAC values. Construct a structurally indistinguishable tree from n IBFs. Alex X. Liu 17/48
Indistinguishable Bloom Filter (IBF) Bloom Filter: Indistinguishable Bloom Filter (IBF) ─ Twin cell: 0 and 1, or 1 and 0 ─ For any element e into an IBF, hash r times into BF using r secret keys k 1 ,…, k r : HMAC(k 1 , e), …, HMAC( k r , e) ─ For the i-th location, which cell stores 1 is determined by another secret key K k+1 and a random number for IBF B. ● The other cell stores 0. Alex X. Liu 18/48
IBTree – Structual Indistinguishability p 1 , p 2 , p 3 , p 4 , p 5 , p 6 , p 7 , p 8 , p 9 , p 10 p 1 , p 2 , p 3 , p 4 , p 5 p 6 , p 7 , p 8 , p 9 , p 10 p 1 , p 2 , p 3 p 6 , p 7 , p 8 p 4 , p 5 p 9 , p 10 p 6 , p 7 p 1 , p 2 p 10 p 1 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 2 Binary 0≤|left| - |right|≤1 Each node is an IBF All IBFs in an IBTree have the same length Leaves are chained Construction is bottom up by logical OR Alex X. Liu 19/48
IBTree Constructed Bottom Up IBTree construction is bottom up by logical OR Alex X. Liu 20/48
Security Model Adaptive IND-CKA: indistinguishability against chosen keyword attack ─ Cloud chooses two distinct sets D 0 and D 1 , and sends to data owner. ● D 0 and D 1 contain equal number of records. ─ Data owner randomly chooses D 0 or D 1 ● Builds metadata I b for the chosen D b , ● Sends I b to cloud. ─ Repeats the following steps for a polynomial number of times ● Cloud chooses a query q, sends the query to data owner – The query has the same # of satisfying elements in D 0 and D 1 . ● Data owner generates trapdoor t q and sends t q to cloud. ● Cloud uses t q to query I b , then chooses a new query based on all pervious queries and query results ─ In the end, cloud guesses b=0/1 still with 50% probability. We do not hide query patterns and access patterns. ─ Privacy is already expensive. We do not want absolute privacy. 21/48
Recommend
More recommend