5/1/19 Motivation Quantitative Information O I U N Program / Flow Analysis T P P Function U U T T CSCI 5271 Guest Lecture • An output has some data of an input. Seonmo (Sean) Kim • If the input contains some sensitive data, then output, too. • The output should contain the intended amount of the input. • An adversary wants to know the input by observing the output. Motivation Motivation • Consider two functions: • Consider two functions: int numCheck2(int input){ int numCheck(int input){ int numCheck(int input){ int numCheck2(int input){ if (input mod 2 == 0) { if (input == 1234) { if (input == 1234) { if (input mod 2 == 0) { return 1; return input; return 1; return input; } } } } return 1; return 0; return 0; return 1; } } } } • The number of output values? • The number of output values? • 2 vs 2 31 +1 Motivation Quantitative Information Flow (QIF) • There are many applications • Given a (deterministic or probabilistic) program P which takes a high related to QIF analysis input H and produces a low output L • An adversary observes L and P may leak information from H (secret) • AI, games, financial programs, to L (public) etc. • Measure the amount of information leaked about H • Scalability 1
5/1/19 Early models of QIF Shannon entropy: initial uncertainty • Used the Shannon mutual information I(X;Y) • 𝐼 𝑌 = −∑ '∈𝒴 Pr[X=𝑦] 0 log 4 Pr[X=𝑦] • Uncertainty • I(H; L) = H(H) − H(H ∣ L) • If H is a 32-bit integer and L := H 1 2 94 , log 4 Pr[H=𝑦] = log 4 2 :94 = −32 • Pr[H=𝑦]= ⁄ • information leaked = initial uncertainty − remaining uncertainty • 𝐼 𝐼 = −2 94 1 2 94 ⁄ −32 = 32 • the adversary’s initial uncertainty before observing L • the adversary’s remaining uncertainty after observing L • H(H) - I(H; L) = H(H ∣ L) Shannon entropy: information leaked Shannon entropy • 𝐽 𝑌;𝑍 = 𝐼 𝑌 − 𝐼 𝑌 | 𝑍 = 𝐼 𝑌 + 𝐼 𝑍 − 𝐼 𝑌,𝑍 • 𝐼 𝑌 = −∑ '∈𝒴 Pr[X=𝑦] 0 log 4 Pr[X=𝑦] • If X is determined by Y, then H(X|Y)=0. • If H is a 32-bit integer and L := H, H(H) = 32 • 𝐽 𝐼;𝑀 = 𝐽 𝑀;𝐼 = 𝐼 𝑀 − 𝐼 𝑀 | 𝐼 = 𝐼 𝑀 • 𝐼 𝑌 | 𝑍 = 𝐼 𝑌 − 𝐽 𝑌;𝑍 • If H is a 32-bit integer and L := H, H(H | L) = 0 • 𝐽 𝑌;𝑍 = 𝐽 𝑍;𝑌 = 𝐼 𝑍 − 𝐼 𝑍 | 𝑌 = 𝐼(𝑍) , if Y is determined by • If H is a 32-bit integer and L := H X • 𝐽 𝐼;𝑀 = 𝐼 𝑀 = 𝐼 𝐼 = 32 1 2 94 , log 4 Pr[H=𝑦] = log 4 2 :94 = −32 • If H is a 32-bit integer and L := H, I(H ; L) = 32 • Pr[H=𝑦]= ⁄ • 𝐼 𝐼 = −∑ '∈𝒴 Pr[X=𝑦] 0 log 4 Pr[X=𝑦] = −2 94 1 2 94 ⁄ −32 = 32 • Exercise • Remaining uncertainty: 𝐼 𝐼|𝑀 = 32 − 32 = 0 • Assume that H is a uniformly-distributed 32-bit integer Program H (H) I (H ; L) H (H | L) L := 0 L := H & 0x0000ffff Shannon entropy Alternative measurement • 𝐼 𝑌 = −∑ '∈𝒴 Pr[X=𝑦] 0 log 4 Pr[X=𝑦] • Consider two programs: • If H is a 32-bit integer and L := H, H(H) = 32 • if H mod 8 == 0 then L := H else L := 1 • 𝐼 𝑌 | 𝑍 = 𝐼 𝑌 − 𝐽 𝑌;𝑍 • An adversary can guess H with probability 1/8 • If H is a 32-bit integer and L := H, H(H | L) = 0 • P[L = 1] = 7/8, P[L = 8 n ] = 1/2 32 where 0 ≤ n < 29 • 𝐽 𝑌;𝑍 = 𝐽 𝑍;𝑌 = 𝐼 𝑍 − 𝐼 𝑍 | 𝑌 = 𝐼(𝑍) , if Y is determined by F + 4 HI • 𝐽 𝐼;𝑀 = 𝐼 𝑀 = F G log G 4 JH log2 94 ≈ 0.169 + 4 X • L := H & 0x0000001f • If H is a 32-bit integer and L := H, I(H ; L) = 32 • An adversary can guess H with probability 1/2 27 • Exercise • 𝐽 𝐼;𝑀 = 𝐼 𝑀 = 5 • Assume that H is a uniformly-distributed 32-bit integer • Which one is more secure? Program H (H) I (H ; L) H (H | L) L := 0 32 0 32 L := H & 0x0000ffff 32 16 16 2
5/1/19 Alternative measurement Alternative measurement • Vulnerability • Consider two programs: • 𝑊 𝑌 = 𝑛𝑏𝑦 '∈𝒴 Pr[X=𝑦] • if H mod 8 == 0 then L := H else L := 1 • min-entropy • 𝑀 = 2 94:9 + 1 • 𝐼 T (𝑌) = −log 4 𝑊(𝑌) • Information Leakage = log 4 𝑀 ≈ 29 • 𝐼 T 𝑌 𝑍) = −log 4 𝑊 𝑌 𝑍) • L := H & 0x0000001f • information leaked = 𝐼 T 𝐼 − 𝐼 T 𝐼 𝑀) • 𝑀 = 2 Y • Let |X| be the number of possible values of X • Information Leakage = log 4 𝑀 = 5 • 𝑊 𝐼 = U |V| ,𝑊 𝐼 𝑀 = |W| |V| • 𝐼 T 𝐼 − 𝐼 T 𝐼 𝑀) = log 4 |𝐼| − log 4 (|𝐼|/|𝑀|) = log 4 |𝑀| Applications Image Anonymization • Image anonymization and Kbattleship (PLDI 2008) • Computing a maximum flow of information • Error reporting system (ASPLOS 2008) • Heartbleed (VMCAI 2018) • Using the model counting technique to measure the leakage Image Anonymization KBattleship 3
5/1/19 Flowcheck Error Reporting System • Dynamic analysis tool to measure an upper-bound estimate of the • Scenario amount of information leaked • Dynamic tainting • Static control-flow regions • c = d = a + b Error Reporting System Measuring privacy loss • Symbolic Execution • For each condition ( op f (.) g (.)), compute a summary for f and g • Generates path conditions based on symbolic or concrete inputs • Use a set of rules to compute the bound given the summaries • Example • (add (bitwise-and x 1) 3) • (bitwise-and x 1) -> 0 or 1 • (add (bitwise-and x 1) 3) -> 3 or 4 Heartbleed Exact Model Counting • Brute-force counting WHITE BOARD • Go through every seat • Simple, but hard to scale occupied empty 4
5/1/19 Exact Model Counting Approximate model counting • Brute-force counting • Random sampling WHITE BOARD WHITE BOARD • Go through every seat • Randomly pick a region • Simple, but hard to scale • Count the number and scale up • DPLL-style counting • Detect a region that is empty • Faster, but still accounts for every seat occupied occupied empty empty Approximate model counting Q & A • Random sampling WHITE BOARD • Randomly pick a region • Count the number and scale up Thank You:) • Random hashing(AAAI 2006) • Everyone flips a coin 𝑙 times • Leave if a tail is ever shown • Count the persons 𝑜 occupied • Approximately 2 \ 0 𝑜 persons empty 5
Recommend
More recommend