sgx ir
play

SGX IR Secure Information Retrieval with Trusted Processors Fahad - PowerPoint PPT Presentation

UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering FEARLESS engineering 1 / 29


  1. UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science SGX IR Secure Information Retrieval with Trusted Processors Fahad Shaon, Murat Kantarcioglu The University of Texas at Dallas FEARLESS engineering FEARLESS engineering 1 / 29

  2. Problem - Secure Cloud based Information Retrieval Encrypted Data Encrypted Search Query Encrypted Result Build a secure information retrieval system ◮ User stores encrypted files in cloud server ◮ Perform selective retrieval FEARLESS engineering 2 / 29

  3. Build Block - Intel SGX ◮ We use Intel SGX - S oftware G uard E x tensions ◮ SGX is new Intel instruction set ◮ Allows us to create secure compartment inside processor , called Enclave ◮ Privileged softwares, such as, OS, Hypervisor, can not directly observe data and computation inside enclave FEARLESS engineering 3 / 29

  4. Threat Model - Intel SGX Enclave Encrypted Data & Code Memory CPU Encrypted Result Disk Server Adversary can control hypervisor, OS, memory, disk of the server FEARLESS engineering 4 / 29

  5. State of The Art ◮ Relevant search or indexing systems that uses SGX - HardIDX (Fuhry et al., 2017), Rearguard (Sun et al., 2018), Oblix (Mishra et al., 2018), Hardware-supported ORAM (Hoang et al., 2019) ◮ These works mainly focus on building efficient data structures for searching using SGX ◮ Assume inverted index is built and/or build the index in client ◮ Did not look into ranked retrieval FEARLESS engineering 5 / 29

  6. Challenges - Access Pattern Leakage Challenge: Access Pattern Leakage ◮ Adversary can observe memory accesses in SGX ◮ Memory access reveals about encrypted data (Islam, Kuzu, and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015) FEARLESS engineering 6 / 29

  7. Challenges - Access Pattern Leakage Challenge: Access Pattern Leakage ◮ Adversary can observe memory accesses in SGX ◮ Memory access reveals about encrypted data (Islam, Kuzu, and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015) Solution ◮ Data Obliviousness - we build custom data oblivious indexing algorithms FEARLESS engineering 6 / 29

  8. Data Obliviousness - Oblivious Select ◮ Data Obliviousness: Program executes same path for all input of same size FEARLESS engineering 7 / 29

  9. Data Obliviousness - Oblivious Select ◮ Data Obliviousness: Program executes same path for all input of same size ◮ Example: x == y ? a : b FEARLESS engineering 7 / 29

  10. Data Obliviousness - Oblivious Select ◮ Data Obliviousness: Program executes same path for all input of same size ◮ Example: x == y ? a : b oblivousSelect (a, b, x, y): ... mov %[x],%%eax mov %[y],%%ebx xor %%eax , %%ebx ... mov %[a],%%ecx mov %[b],%%edx cmovz %%ecx ,%% edx ... mov %%edx , %[out] FEARLESS engineering 7 / 29

  11. Challenge - Memory Constraint Challenge: Memory Constraint ◮ SGX (v1) only 90MB enclave FEARLESS engineering 8 / 29

  12. Challenge - Memory Constraint Challenge: Memory Constraint ◮ SGX (v1) only 90MB enclave Solution ◮ Blocking - Break large data into small blocks ◮ We utilize SGXBigMatrix (Shaon et al., 2017) primitives ◮ BigMatrix handles the complexity of data blocking FEARLESS engineering 8 / 29

  13. Objectives - Summary Encrypted Intermediate Data Encrypted Search Query Encrypted Result Pre-Processing Final Processing ◮ Very low client side processing ◮ Build index securely in the cloud using SGX ◮ Build data oblivious algorithms ◮ Support ranked retrieval FEARLESS engineering 9 / 29

  14. SGX IR - Document and Query Types ◮ Text Data ◮ Ranked document retrieval using TF-IDF (Token Frequency and Inverse Document Frequency) ◮ Image Data ◮ Face recognition using Eigenface FEARLESS engineering 10 / 29

  15. Text Pre-Processing - Client cryptographi 1 tok-id doc-id freq practic 2 Encrypted 1 1 2 Cryptography is cryptographi the practice and Tokenization BigMatrix TokenID 1 10 practic studi studi 3 2 study of techniques Stemming Generation Generation techniqu secur for secure commun 3 1 6 techniqu 4 communication ... ... 5 ... ... secur ◮ We tokenize and stem the input text files ◮ We build a matrix I with token id , document id , and frequency columns ◮ Finally, we encrypt I and upload ◮ Single round of read and write is required FEARLESS engineering 11 / 29

  16. Text Indexing - Server freq tok-id count sum tok-id doc-id freq tok-id doc-id 1 1 2 1 8 20 1,0 1 2 ... ... ... 2 4 9 2 1 3 ... ... ... 3 7 15 1,1 7 2 ... ... ... 8 2 1 4 5 3 Indexing 5 1 2 1 2 5 2,0 1 3 ... ... ... ... ... ... 6 1 1 ... ... ... 2,1 17 3 8 9 10 ... ... ... 1 4 1 # # # ... ... ... ... ... ... 3,0 9 10 ◮ Input I , we output two matrices ◮ U ′ containing total frequencies of the tokens, for IDF calculation ◮ T containing equal length blocks of token to document frequency mapping for TF calculation FEARLESS engineering 12 / 29

  17. Text Indexing - IDF - Server tok-id doc-id freq tok-id doc-id freq tok-id count sum tok-id count sum 1 1 2 1 8 20 1 1 2 1 0 0 2 1 3 1 2 5 # # # 2 4 9 ... ... ... ... ... ... Sort and Sort 1 4 1 Count & Sum 3 7 15 ... ... ... 8 2 1 2 8 20 4 5 3 Adjust 1 2 5 2 1 3 # # # 5 1 2 ... ... ... ... ... ... 2 5 10 6 1 1 ... ... ... ... ... ... 17 3 8 3 4 9 1 4 1 3 6 4 # # # # # # ... ... ... ... ... ... ... ... ... ... ... ... ◮ I ′ ← Obliviously sort I on token id column ◮ We generate U , to keep count and sum of frequencies ◮ c ← I ′ [ i ] .token id � = I ′ [ i − 1] .token id ◮ U [ i ] .sum ← obliviousSelect ( sum, # , 1 , c ) ◮ sum ← obliviousSelect ( sum, 0 , 1 , c ) + I [ i ] .frequency ◮ Finally, we sort this matrix so that the dummy entries go to the bottom FEARLESS engineering 13 / 29

  18. Text Indexing - TF - Block Size Optimization ◮ We can read document frequency of tokens from matrix I ′ ◮ This will reveal number of documents having a specific token ◮ So, we split I ′ into equal length blocks ◮ We optimize block size b from count column of U ′ using technique outline in (Shaon and Kantarcioglu, 2016) ◮ We assume the frequency follow Pareto distribution ◮ Mathematically find the value minimize the padding FEARLESS engineering 14 / 29

  19. Text Indexing - TF - Padding Generation We regenerate token id with bucket number function σ tok-id doc-id freq tok-id doc-id freq 1,0 1 2 1 1 2 ... ... ... 1 2 5 Regenerate 1 4 1 TokenId 1,1 7 2 ... ... ... ... ... ... 2 1 3 2,0 1 3 ... ... ... 2 5 10 ... ... ... 2,1 9 10 ... ... ... 3 6 4 ... ... ... 3,0 9 10 We generate padding tok-id count sum freq tok-id doc-id 1 8 20 1,1 # # ... ... ... 2 4 9 Generate # # 3 7 15 # 4 5 3 Padding Rows ... ... ... 5 1 2 2,1 # # ... ... ... 6 1 1 ... ... ... # # # ... ... ... # # # ... ... ... 3,1 # # Finally we merge and sort X and J to get the output T matrix. FEARLESS engineering 15 / 29

  20. TF - IDF Calculation ◮ On T we run term frequency functions - (log normalization) 1 + log ( tf t,d ) ◮ On U ′ we run document frequency functions, such as, IDF log N d f t ◮ Query result we use T for TF and U ′ for IDF FEARLESS engineering 16 / 29

  21. Bitonic Sorting of Arbitrary Input Size ◮ Sorting is one of the most frequently used operations ◮ We use arbitrary length Bitonic sort version (Lang, 1998) ◮ However, existing definition is recursive ◮ Not suitable for memory constrained environments like SGX ◮ So, we propose a non-recursive algorithm without using stack FEARLESS engineering 17 / 29

  22. Bitonic Sort Non Recursive Algorithm - Concept Concept ◮ We can express a number as N = 2 x m + ... +2 x 3 +2 x 2 +2 x 1 ◮ Merge network can sort a descending and an ascending block into ascending order block ◮ We sort then merge from smallest to biggest block Sort Merge FEARLESS engineering 18 / 29

  23. Bitonic Sort Non Recursive Algorithm 1: for d = 0 to ⌈ log 2 ( N ) ⌉ do if (( N >> d ) & 1) � = 0 then 2: start ← ( − 1 << ( d + 1)) & N 3: size ← 1 << d 4: dir ← ( size & N & − N ) � = 0 5: bitonicSort 2 K ( start, size, dir ) 6: if ! dir then 7: bitonicMerge ( start, N − start, 1 ) 8: end if 9: end if 10: 11: end for FEARLESS engineering 19 / 29

  24. Face recognition indexing ◮ We adopt EigenFace ◮ Pre-processing and matching face are simple matrix operations ◮ Core problem to solve obliviously is eigenvector calculation ◮ We adopt Jacobi method of eigenvector calculation FEARLESS engineering 20 / 29

  25. Eigenvector calculation - Jacobi method Oblivious Oblivious Column Extract Column Assign Oblivious Value Extract Rotate Calculate & Ressign Oblivious Row Assign We find the max off-diagonal element at A k,l , then rotate column k and l . Repeat until A becomes diagonal. The diagonal values are eigenvalues. FEARLESS engineering 21 / 29

  26. Experimental Evaluations We implemented a prototype using Intel SGX SDK 2.6 for Linux Setup ◮ Processor Intel Xeon E3-1270 ◮ Memory 64GB ◮ OS Ubuntu 18.04 ◮ SGX SDK Version 2.6 for Linux FEARLESS engineering 22 / 29

Recommend


More recommend