INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 1 - Elements of Information Theory Matthieu Bloch December 2, 2019 1
CONTEXT AND OBJECTIVES CONTEXT AND OBJECTIVES Context Security is an increasingly problematic requirement Crypto works… most of the time Lightweight solution desirable Information-theoretic security may help IT security vs. cryptography IT security makes no assumption on computation power but requires noisy observation structure Cryptography makes no assumption on observation structure but restricts computational power Opposite philosophies are not necessarily incompatible Objectives of course Demystify canonical results in information-theoretic security Provide tools to read papers easily and maybe write papers Prove stuff! 3
4
INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Definition. (Information-theoretic security) Set of signal processing and coding mechanisms that exploit asymmetries in interaction and perception to make an attackers’s job harder Asymmetries in interaction and perception Key distinguishing factors with more traditional approaches Assumption is both a strength and a weakness Strength : potential gains in “efficiency” Weakness: model easily questioned Connected research areas Signal processing, coding theory, communication theory, information theory Computer science, control theory, differential privacy, machine learning Most of the emphasis in this course will be on information theory and coding theory 5
AGENDA FOR COURSE AGENDA FOR COURSE Lectures 1-2: Elements of information theory Channel reliability and source coding with side information Channel output approximation and randomness extraction Lectures 3-4: Information-theoretic secrecy Secure communication over the wiretap channel Secret key generation from correlated source Lecture 5: Information-theoretic covertness Undetectable communications Lecture 6: Information-theoretic authentication Lecture 7: Information-theoretic privacy Lecture 8-9: Coding for secrecy Polar codes Universal hash functions Lecture 10: Uncertain and adversarial models http://wwww.phdcomics.com 7
TODAY: TODAY: IT TOOLS AND TECHNIQUES IT TOOLS AND TECHNIQUES Objective: calibrate notation and concepts that we will build on extensively Useful tools Jensen’s inequality, distances between distributions, concentration inequalities Information theoretic metrics Entropy and mutual information Canonical information theoretic results Channel coding Source coding with side information Extensive and detailed lecture notes will be provided 9
TOOLS: TOOLS: RANDOM VARIABLES RANDOM VARIABLES Random variable , realization , alphabet , probability mass function X x X p X ∈ Δ( X ) We will deal with continuous random variables separately O�en sums can be replaced by integrals and PMFs by PDFs without too much thinking Definition. (Markov kernel) is a Markov Kernel from to if for all and for all W X Y x , y ∈ X × Y W ( y | x ) ≥ 0 x ∈ X . ∑ y ∈ Y W ( y | x ) = 1 For any define p ∈ Δ( X ) W ⋅ p ∈ Δ( X × Y ) ( W ⋅ p ) ( x , y ) ≜ W ( y | x ) p ( x ) W ∘ p ∈ Δ( Y ) ( W ∘ p ) ( y ) ≜ ∑ ( W ⋅ p )( x , y ) x ∈ X Do not overthink this notation, it is sometimes helpful to manipulate marginals of joint distributions more easily Definition. (Markov chain) Let be real-valued random variables with joint . Then , , form a Markov chain in X , Y , Z p XYZ X Y Z that order, denoted , if and are conditionally independent given , i.e., X − Y − Z X Z Y we have . ∀( x , y , z , ) ∈ X × Y × Z p XZY ( x , y , z ) = p Z | Y ( z | y ) p Y | X ( y | x ) p X ( x ) 11
12
13
14
TOOLS: TOOLS: JENSEN’S INEQUALITY JENSEN’S INEQUALITY Definition. (Convexity and concavity) A function is convex if . f : [ a ; b ] ⟼ R ∀ λ ∈ [0; 1] f ( λa + (1 − λ ) b ) ≤ λf ( a ) + (1 − λ ) f ( b ) A function is strictly convex if the inequality above is strict. A function is (strictly) concave if is f f − f (strictly) convex. Theorem (Jensen's inequality) Let be a real-valued random variable defined on some interval and with . Let X [ a , b ] p X f : [ a , b ] → R be a real valued function that is convex in . Then, [ a , b ] f ( [ X ]) ≤ E E [ f ( X )]. For any strictly convex function, equality holds if and only if is a constant. X Jensen’s inequality also holds more generally for continuous random variables. Proposition (Log-sum inequality) Let and . Then, R n R n a i } n b i } n { ∈ { ∈ + + i =1 i =1 ∑ n n n ( i =1 a i ) a i ∑ a i log ≥ ( ∑ a i ) log . ∑ n b i ( i =1 b i ) i =1 i =1 15
16
TOOLS: TOOLS: TOTAL VARIATION TOTAL VARIATION Many IT-security metrics express how “close” or “distinct” probability distributions are Definition. (Total variation) 1 1 For p , q ∈ Δ( X ) V ( p , q ) ≜ 2 ∥ p − q ∥ 1 ≜ 2 ∑ | p ( x ) − q ( x )| x is a legitimate distance on (symmetry, positivity, triangle, inequality) V (⋅, ⋅) Δ( X ) Proposition (Alternative expression for total variation) ∀ p , q ∈ Δ( X ) V ( p , q ) = sup ( P p ( E ) − P q ( E )) = sup ( P q ( E ) − P p ( E )) . E ⊂ X E ⊂ X Consequently, . 0 ≤ V ( p , q ) ≤ 1 Proposition (Properties of total variation) For and a kernel from to , we have p , q ∈ Δ( X ) W X Y V ( W ⋅ p , W ⋅ q ) = V ( p , q ) and V ( W ∘ p , W ∘ q ) ≤ V ( p , q ) . 17
18
TOOLS: RELATIVE ENTROPY TOOLS: RELATIVE ENTROPY Definition. (Relative entropy) p ( x ) p , q ∈ Δ( X ) D ( p ∥ q ) ≜ ∑ p ( x ) log q ( x ) x with the convention that if is not absolutely continuous , i.e., is not a D ( p ∥ q ) = ∞ p q supp ( p ) subset of . supp ( q ) Relative entropy is not symmetric Proposition (Positivity of entropy) For any with equality if and only if . p , q ∈ Δ( X ) D ( p ∥ q ) ≥ 0 p = q Proposition (Pinsker's inequality) For any . − − − − − − p , q ∈ Δ( X ) V ( p , q ) ≤ √ D ( p ∥ q ) Proposition (Reverse Pinsker's inequality) For any with , we have where 1 p , q ∈ Δ( X ) supp ( q ) = X D ( p ∥ q ) ≤ V ( p , q ) log q min . q min ≜ min x q ( x ) One can go back and forth between and but the metrics are not equivalent V (⋅, ⋅) D (⋅∥⋅) 19
20
21
22
TOOLS: TOOLS: ENTROPY ENTROPY Definition. (Entropy and conditional entropy) For two jointly distributed random variables on X , Y Δ( X × Y ) H ( X ) ≜ E X [− log p X ( X )] H ( X | Y ) ≜ E XY [ − log p X | Y ( X | Y ) ] We also define H ( XY ) = E XY [− log p XY ( XY )] = H ( X ) + H ( Y | X ) Proposition (Positivity) Let be a discrete random variable. Then with equality iff cst. X ∈ X H ( X ) ≥ 0 X = Let be correlated discrete random variables with joint . Then with equality if X , Y p XY H ( Y | X ) ≥ 0 and only if is a function of . Y X Proposition (Chain rule) Let be two joint discrete random variable, then X , Y H ( XY ) = H ( X ) + H ( Y | X ) = H ( Y ) + H ( X | Y ). 23
24
25
26
TOOLS: ENTROPY TOOLS: ENTROPY Proposition (Fano's inequality) Let be a discrete random variable with alphabet . Let be an estimate of , and with ^ ^ X X X X X ∈ X joint distribution . We define the probability of estimation error: . Then, ^ p XX P e ≜ P [ X ≠ X ] ^ . ^ H ( X | X ) ≤ H b P e ( ) + P e log(| X | − 1) Lemma (Csiszar's inequality) | X | p , q ∈ Δ( X ) | H ( p ) − H ( q )| ≤ V ( p , q ) log V ( p , q ) 27
28
TOOLS: TOOLS: MUTUAL INFORMATION MUTUAL INFORMATION Definition. (Mutual information) I ( X ; Y ) = D ( p XY p X p Y ∥ ) = H ( Y ) − H ( Y | X ) I ( X ; Y | Z ) = ( x ) D ( ∥ ) = H ( Y | Z ) − H ( Y | XZ ) ∑ P Z p XY | Z = z p X | Z = z p Y | Z = z z Proposition (Monotonicity of entropy) Let and be discrete random variables with joint probability distribution function . Then X Y p XY , i.e. "conditioning reduces entropy.’’ H ( X | Y ) ≤ H ( X ) Proposition (Chain rule) I ( X 1 X 2 ; Y ) = I ( X 1 ; Y ) + I ( X 2 ; Y | X 1 ) 29
30
31
Recommend
More recommend