Format - Tra ransform rming Encryption (more than meets the DPI) Tom om S Shrimpton on Florida Institute for Cybersecurity Research University of Florida
Monday In-place encryption of CC database Encrypt 4417 1234 5678 9112 1234 5678 9876 5432 Today Circumvention of nation-state internet censorship “HTTP: … free+speech+democracy …” “Looks benign, let it pass”. ciphertext payload TCP/IP Encrypt Deep-packet inspection (DPI)
Traditional encryption is ill-suited for these tasks key plaintext Encrypt ciphertext Natively, plaintexts are bit strings Traditional security goal: (not 16-digit decimal strings) make ciphertexts indistinguishable from random bit strings (not well-formatted HTTP messages or CC #s) 3
ming Encryption Format- Transformi (inspired by Bellare et al. “Format-Preserving Encryption”) key plaintext FTE ciphertext (“helper info”) plaintext format in the specified ciphertext format format (“target”) A format is a set. FTE is like traditional encryption, with the extra operational requirement that ciphertexts abide by the ciphertext format 4
Flexibility is “baked in” to the syntax key plaintext FTE ciphertext (“helper info”) plaintext format ciphertext format (“target”) To change the “look” of ciphertexts, just change the ciphertext format . The system doesn’t (necessarily) need to change. 5
Let’s consider the censorship-circumvention setting DPI TCP/IP FTE ciphertext payload FTE 6
In this setting, shouldn’t assume anything about plaintext formats… key plaintext FTE ciphertext (“helper info”) {0,1}* ciphertext format (“target”) TCP/IP ciphertext payload FTE 7
… so let’s focus on this simpler API key plaintext FTE ciphertext ciphertext format (“target”) TCP/IP ciphertext payload FTE 8
Our goal: to cause real DPI systems to reliably misclassify plaintext traffic for example, HTTP misclassified as FTP “This is an FTP message.” TCP/IP ciphertext payload FTE “FTP” ciphertext format 9
Our goal: to cause real DPI systems to reliably misclassify our (plaintext) traffic as whatever protocol we want TCP/IP ciphertext payload FTE arbitrary ciphertext format (while still having good throughput, low latency…)
“This is an _____ message.” We wondered: How do real DPI devices determine to what protocol a message belongs? System Classification Tool Price appid free l7-filter free YAF free bro free nProbe ~300 Euros DPI-X ~$10K Enterprise grade DPI, well-known company 11
“This is an _____ message.” We wondered: How do real DPI devices determine to what protocol a message belongs? System Classification Tool Price appid Regular expressions free l7-filter Regular expressions free Regular expressions YAF free (sometimes hierarchical) Simple regular expression triage, bro free then additional parsing and heuristics nProbe Parsing and heuristics (many of them “ regular ”) ~300 Euros DPI-X ??? ~$10K Regular langauges/expressions figure heavily in state-of-the-art DPI classification tools 12
Regular-expression-based FTE key plaintext FTE ciphertext in L(R) regex R Regex defines the ciphertext format L(R) How should we realize regex-based FTE? Cryptographic protection for the plaintext We want: Ciphertexts in L(R) 13
Realizing regex-based FTE key encryption plaintext ciphertext in L(R) regex R How should we realize regex-based FTE? Cryptographic protection for the plaintext We want: Ciphertexts in L(R) 14
Ranking a Regular Language [Goldberg, Sipser ’85] [Bellare et al. ’09] rank(x i )=i x i Let L(R) be lexicographically ordered x 0 < x 1 < … < x i < … < x |L(R)-1| L(R) unrank(2)=x 2 x 2 0 1 2 i |L(R)|-1 Given a DFA for L(R), there are efficient algorithms rank: L(R) {0,1,…,|L(R)|-1} With precomputed tables, unrank: {0,1,…,|L(R)|-1} L(R) rank, unrank are O (n) such that rank( unrank(i) ) = i and unrank( rank(x i ) ) = x i 15
Realizing regex-based FTE Intermediate ciphertext, interpreted as an integer n… …outputs n th string in lexicographic ordering of L(R) key encryption plaintext ciphertext in L(R) unrank regex R regex-to-DFA 16
Now all we need are good regular expressions key regex-based plaintext a string in L(R) FTE regex R We considered three options : 1. If the DPI is open source (appid, l7-filter, YAF), try to extract them , directly! 2. Build them manually , using RFCs and (when possible) DPI source code. 3. Learn them from traffic that was allowed by the DPI. 17
Use case: Browsing the web through an FTE tunnel FTE “wins” if the DPI classifies the stream it sees as the target protocol Internet FTE proxy FTE client R target R target FTE ciphertexts regular expressions for HTTP, SSH, SMB, … messages Using each “target” format, we visited each of the Top 50 websites five times. 18
Punchline: regex-based FTE can make real DPI say whatever we want it to ~100% of the time. “Help!” input protocol input protocol stream stream FTE proxy FTE client R target R target 19
Browser experience Browser experience through FTE tunnel ≈ through SSH tunnel FTE library is open-source, runs on multiple platforms/OS, and is fully integrated with major circumvention efforts Eric Schmidt gave us a sizable unsolicited research gift
A field test… Without FTE tunnel , we tried Facebook, YouTube, Tor website, banned search queries… With FTE tunnel , we tried Facebook, YouTube, Tor website, banned search queries… FTE client Internet FTE proxy Used FTE to download Tor bundle: Tor without FTE : “active blacklisting” attack on proxy Tor through FTE : no problems Ran various tests every 5 minutes for one month, no sign of detection in logs. (We shut it down after that.) 21
What about in-place encryption of CC database? key regex-based 1234 5678 9876 5432 4417 1234 5678 9112 FTE regex for language of 16-decimal digit CC #s 22
Not quite handled by “simpler” FTE construction key encryption 4417 1234 5678 9112 1234 5678 9876 5432 unrank CC# regex regex-to-DFA 1) valid 16-digit number in, valid 16-digit number out |plaintext language| = |ciphertext language| 2) conventional encryption takes bit strings as input encoding of valid 16-digit strings into bitstrings expands the effective plaintext space 3) conventional encryption has ciphertext stretch can have exponential number of AE ciphertexts that cannot be unranked!
Recall the full FTE API… key plaintext FTE ciphertext (“helper info”) plaintext format ciphertext format 24
“rank-encrypt-unrank” FTE construction (generalization of Bellare et al. SAC’09) key ptxt encrypt in L(R1) rank ctxt unrank in L(R2) ptxt regex regex-to-DFA R1 ctxt regex regex-to-DFA R2 ranking provides optimal compression of L(R) 25
“rank-encrypt-unrank” FTE construction key ptxt encrypt in L(R1) rank ctxt unrank in L(R2) ptxt regex regex-to-DFA R1 ctxt regex regex-to-DFA R2 Great potential… but developers face many hard questions: -- Can I even use R1 and R2 together? (Requires |L(R1)| ≤ |L(R2)|) -- Should “encrypt” be deterministic (i.e. a cipher) or can I use traditional encryption? -- Will both R1 and R2 admit time/space efficient implementations of (un)ranking? -- … 26
The space/memory issue key ptxt encrypt rank ctxt unrank regex regex-to-DFA R1 regex regex-to-DFA R2 unranking requires space linear in the size of the DFA , For some regular expressions, this works out just fine… regex NFA DFA 27
The space/memory issue key ptxt encrypt rank ctxt unrank regex regex-to-DFA R1 regex regex-to-DFA R2 unranking requires space linear in the size of the DFA , NFA DFA …for others, you can have an exponential space blow-up regex 28
The space/memory issue key ptxt encrypt rank ctxt unrank regex regex-to- NFA R1 regex regex-to- NFA R2 Wanted : efficient (un)ranking methods that work directly from the NFA representation regex NFA DFA Problem : (un)ranking from NFAs (or directly from a regex) is PSPACE-complete 29
relaxed rank-encrypt-unrank FTE construction key ptxt relaxed encrypt relaxed rank ctxt unrank regex regex-to- NFA R1 regex regex-to- NFA R2 Wanted : efficient (un)ranking methods that work directly from the NFA representation regex NFA DFA Problem : (un)ranking from NFAs (or directly from a regex) is PSPACE-complete We side-step this by developing a new “relaxed ranking” algorithm 30
Ranking of a language from a DFA L(R) I 1-1 correspondence between strings and accepting paths + efficient alg. for 1-1 mapping between paths and integers x i rank(x i )=i and unrank(i)=x i p original intermediate representation representation (strings) (accepting DFA paths) | I |-1 =|L(R)|-1 0 1 2 i 31
deterministic “Rank” rank encrypt unrank R regex-to-DFA R regex-to-DFA L(R) I x i p original intermediate Enc K (i) representation representation (strings) (accepting DFA paths) c | I |-1 =|L(R)|-1 0 1 2 i encrypt and decrypt are done over this set 32
Recommend
More recommend