Enc Encryp ypted Sear ed Search h Seny Kamara Brown University
2
3
4
Q: Why is this happening? 5
Big Data ► Industry and Governments want more data ► NaDonal security ► Machine learning ► Business analyDcs ► NLP ► LocaDon-based services ► … 6
Big Data u Harder to secure u More intrusive & sensitive u NSA Bluffdale holds 2EBs! (2K PBs) u Photos, medical records u Facebook holds 300PBs of photos/ u Location data, email, videos u browsing history, voicemails u Vs. nation states, intelligence u Greater need for security agencies, organized crime, insiders, … 7
Big Data u End-to-end (e2e) encryption! u Impossible to work with u Reduces attack surface u Lose search, DBs, IR u Secure small key instead of Big Data u Find your photo among 300PBs? u Rank results? 8
Q: Can we search on encrypted data? 9
An InteresDng QuesDon Databases Data Structures Graph Theory Cryptography InformaDon Retrieval Combinatorial OpDmizaDon 10 StaDsDcs
A LucraDve QuesDon ► Startups ► Major Corporations ► Funding agencies ► CipherCloud ($30M+$50M) ► Microsoft, IBM, ► IARPA ► Navajo (Salesforce) ► Google, Yahoo ► DARPA ► SkyHigh , Vaultive, Inpher ► Hitachi, Fujitsu ► NSF ► Bitglass, Private Machines, … 11
“There are a lot of advancements in things like encrypted search ...but in general it is a difficult problem” -- Edward Snowden @ SXSW‘14 12
Encrypted Search SoluDons 13
Usage EDB tk DB DB 14
Desiderata St Storag age le leak akag age Siz Size of f EDB EDB EDB Se Sear arch h tk Dme me Siz Size of f tk tk 15 Query leakage Qu
Many Approaches ► Stream ciphers [SWP01] ► BuckeDng [HILM02] ► Structured and searchable encrypDon (StE/SSE) [SWP01,CGKO06,CK10] ► Oblivious RAM (ORAM) [GO96] ► FuncDonal encrypDon (e.g., PEKS) [BCOP06] ► MulD-party computaDon (MPC) [Yao82,GMW87] ► Property-preserving encrypDon (PPE) [AKSX04,BBO06,BCLO09] ► Fully-homomorphic encrypDon [G09] 16
Tradeoffs: Efficiency vs. Security Effic fficiency ST STE/SSE-based PPE-based skFE-based pkFE-based ORAM-based FHE-based Leak Leakag age e 17
Tradeoffs: FuncDonality vs. Efficiency FuncDonality Fu ORAM-based SQL QL PPE PPE-based -based FHE-based STE/SSE-based SK-FE-based NoSQL QL PK-FE-based Effic fficiency 18
Leakage ► TheoreDcal Cryptography [Goldwasser-Micali82,…] ► A great success story ► Helps us reason about confidenDality, integrity, … ► Focused on leakage-free cryptography ► Real-world systems security relies on tradeoffs ► No cryptographic foundaDons for tradeoffs ► Can we leak X but not Y? ► How do we model leakage? 19
Leakage [Curtmola-Garay-K.-Ostrovsky06, Chase-K.10, Islam-Kuzu-Kantarcioglu12, K.15] Leakage cryptanalysis Leakage analysis Proof of security ► Leakage analysis: what is being leaked? ► Proof: prove that soluDon leaks no more ► Cryptanalysis: can we exploit the leakage? 20
ApplicaDons 21
Encrypted Search Engines ► Desktop search ► Windows search, Apple Spotlight ► Personal cloud storage ► Dropbox, OneDrive, iCloud, … ► Webmail ► Gmail, Yahoo! Mail, Outlook.com,… 22
Encrypted DBs ► Standard DBs ► DB encrypted in memory ► Cloud DBs ► DB encrypted in cloud 23
Encrypted NSA Metadata Program [K.14] 1 3 2 ► To & from numbers, Dme of call, duraDon for all US-to-US, US-to-Foreign and Foreign-to-US calls 1 ► NSA DB can only be queried by individual phone number (seed) 2 ► Analyst queries must be approved by small number of NSA officials 3
Systems (Provably Secure) 25
Systems ► CS2 (C++) ► BlindSeer (C++) [IARPA] ► Microsos Research, 2012 ► Columbia & Bell Labs, 2014 ► Queries: single keyword search ► Queries: boolean ► 16MB email collecDon in 53ms ► SyntheDc dataset ► Search Dme ► Fo For (w 1 an and w w 2 ): 250ms ► w 1 in 1 docs ► w 2 in 10K docs 26
Systems ► IBM-UCI (C++) [IARPA] ► Clusion (Java) ► IBM Research & UC Irvine, 2013 ► Brown & Colorado St., 2016 ► Queries: conjuncDve ► Queries: Boolean ► 1.3GB email collecDon ► 1.3GB email collecDon ► Search Dme ► Search Dme ► Fo For (w 1 an and w w 2 ): 5ms ► For (w (w 1 or w or w 2 ) and (w ) and (w 3 or w or w 4 ) in ) in 1.5 1.5ms ms ► w 1 in 15 docs ► (w (w 1 or w or w 2 ) ) in 10 docs ► w 2 in 1M docs ► (w (w 3 or w or w 4 ) ) in 1M docs 27
Systems ► GRECS ► Microsos Research, Boston U., Harvard & Ben Gurion, 2015 ► Queries: (approximate) shortest distance on graphs ► 1.6M nodes & 11M edges ► Query Dme: 10ms 28
Conclusions ► ExciDng and acDve area of research ► Big potenDal impact in pracDce ► Lots of new research direcDons in theory and systems ► PotenDal for collaboraDon between many areas of CS ► Algorithms and data structures ► Databases ► InformaDon retrieval ► Combinatorial opDmizaDon ► StaDsDcs 29
30
Recommend
More recommend