Entropies Reading for this lecture: Elements of Information Theory (EIT) , Chapters 1 & Sections 2.1-2.8. Last Week’s Homework: Correction online. Updated syllabus on course website, including all readings. Coming weeks: Information Theory Computational Mechanics Projects: First reports begin 1 June, 4 per class Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Sources of Information: Apparent randomness: Uncontrolled initial conditions Actively generated: deterministic chaos Hidden structure: Ignorance of forces Limited capacity to represent structure Issues: What is information? How do we measure unpredictability or structure? Information Energy � = History: Boltzmann (19th Century): Equilibrium in large-scale systems Hartley-Shannon-Wiener (Early 20th): Communication & Cryptography Threads: Coding, Statistics, Dynamics, Learning (Late 20th) Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
� Entropies ... Information as uncertainty and surprise: Observe something unexpected: gain information Bateson: A difference that makes a difference How to formalize? Shannon’s approach: Connection with Boltzmann’s Entropy A measure of surprise Self-information of an event: ∝ − log Pr(event) Predictable: No surprise − log 1 = 0 Completely unpredictable: Maximally surprised 1 − log Number of Events Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Khinchin axioms for a measure of information: Random variable: X, x ∈ X = { 1 , 2 , . . . , k } Distribution: Pr( X ) = ( p 1 , . . . , p k ) Shorthand: X ∼ p ( x ) Axioms: H ( X ) = H ( p 1 , . . . , p k ) � 1 (1) Maximum at equidistribution: k , . . . , 1 � H ( p 1 , . . . , p k ) ≤ H k (2) Continuous function of distribution: H ( p 1 , . . . , p k ) versus p i (3) Expansibility: H ( p 1 , . . . , p k ) = H ( p 1 , . . . , p k , p k +1 = 0) (4) Additivity of independent systems: H ( A, B ) = H ( A ) + H ( B ) k � Then get Shannon entropy : H ( X ) = − p i log p i i =1 Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Shannon axioms for a measure of information: Random variable: X, x ∈ X = { 1 , 2 , . . . , k } Distribution: Pr( X ) = ( p 1 , . . . , p k ) Shorthand: X ∼ p ( x ) Axioms: H ( X ) = H ( p 1 , . . . , p k ) (1) Maximum surprise: H ( 1 2 , 1 2 ) = 1 (2) Continuous function of distribution: H ( p 1 , . . . , p k ) versus p i (3) Merging: H ( p 1 , p 2 , p 3 , . . . , p k ) 2 events k − 1 events � �� � � �� � p 1 p 2 = H ( p 1 + p 2 , p 3 , . . . , p k ) + ( p 1 + p 2 ) H ( p 1 + p 2 ) p 1 + p 2 , k � Then get Shannon entropy : H ( X ) = − p i log p i i =1 Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... k Shannon Entropy: H ( X ) = − � p ( x ) log 2 p ( x ) x ∈X H ( X ) = �− log 2 p ( x ) � Units: Log base 2: H ( X ) = [bits] Natural log: H ( X ) = [nats] Example: Binary random variable X = { 0 , 1 } Pr(1) = p & Pr(0) = 1 − p Binary entropy function: H ( p ) = − p log 2 p − (1 − p ) log 2 (1 − p ) Fair coin: p = 1 H ( p ) = 1 bit 2 Completely biased coin: p = 0 (or 1) H ( p ) = 0 bits Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Example: IID Process over four events Pr( X ) = ( 1 2 , 1 4 , 1 8 , 1 8 ) X = { a, b, c, d } Entropy: H ( X ) = 7 4 bits Number of questions to identify the event? x = a? (must always ask at least one question) x = b? (this is necessary only half the time) x = c? (only get this far a quarter of the time) Average number: questions 1 · 1 + 1 · 1 2 + 1 · 1 4 = 1 . 75 Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
� Entropies ... Interpretations: Observer’s degree of surprise in outcome of a random variable Uncertainty in random variable Information required to describe random variable A measure of flatness of a distribution Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Two random variables: ( X, Y ) ∼ p ( x, y ) Joint Entropy: Average uncertainty in X and Y occurring � � H ( X, Y ) = − p ( x, y ) log 2 p ( x, y ) x ∈X y ∈Y Conditional Entropy: Average uncertainty in X, knowing Y � � H ( X | Y ) = − p ( x, y ) log 2 p ( x | y ) x ∈X y ∈Y H ( X | Y ) = H ( X, Y ) − H ( Y ) Not symmetric: H ( X | Y ) � = H ( Y | X ) Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Example: Dining on campus Food served at cafeteria is a random process: Random variables: Dinner one night: D ∈ { Pizza , Meat w / Vegetable } = { P, M } Lunch the next day: L ∈ { Casserole , Hot Dog } = { C, H } After many meals, estimate: Pr( P ) = 1 2 & Pr( M ) = 1 2 Pr( C ) = 3 4 & Pr( H ) = 1 4 Entropies: H ( D ) = 1 bit H ( L ) = H ( 3 4 ) ≈ 0 . 81 bits Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Example: Dining on campus ... Also, estimate the joint probabilities: Pr( P, C ) = 1 4 & Pr( P, H ) = 1 4 Pr( M, C ) = 1 2 & Pr( M, H ) = 0 Joint Entropy: H ( D, L ) = 1 . 5 bits Dinner and Lunch are not independent: H ( D, L ) = 1 . 5 bits � = H ( D ) + H ( L ) = 1 . 81 bits Suspect something’s correlated: What? Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Example: Dining on campus ... Conditional entropy of lunch given dinner: Pr( C | P ) = Pr( P, C ) / Pr( P ) = 1 2 Pr( H | P ) = Pr( P, H ) / Pr( P ) = 1 2 Pr( C | M ) = Pr( M, C ) / Pr( M ) = 1 Pr( H | M ) = Pr( M, H ) / Pr( M ) = 0 Lunch unpredictable, if dinner was Pizza H ( L | P ) = 1 bit Lunch predictable, if dinner was Meat w/Veg H ( L | M ) = 0 bits Average uncertainty about lunch, given dinner: H ( L | D ) = 2 3 bits Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Example: Dining on campus ... Other way around? Conditional entropy of dinner given lunch: Pr( P | C ) = Pr( P, C ) / Pr( C ) = 1 3 Pr( M | C ) = Pr( M, C ) / Pr( C ) = 2 3 Pr( P | H ) = Pr( P, H ) / Pr( H ) = 1 Pr( M | H ) = Pr( M, H ) / Pr( H ) = 0 H ( D | C ) = H ( 2 3 ) ≈ 0 . 92 bits H ( D | H ) = 0 bits Average uncertainty about dinner, given lunch: H ( D | L ) = 3 4 H ( 2 3 ) ≈ 0 . 61 bits Note: H ( D | L ) � = H ( L | D ) Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Relative Entropy of Two Distributions: X ∼ p ( x ) & Y ∼ q ( x ) Relative Entropy: Note: 0 log 0 q = 0 p ( x ) � D ( P || Q ) = p ( x ) log 2 p log p 0 = ∞ q ( x ) x ∈ X Properties: (1) D ( P || Q ) ≥ 0 (2) D ( P || Q ) = 0 ⇐ ⇒ p ( x ) = q ( x ) (3) D ( P || Q ) � = D ( Q || P ) Also called: Kullback-Leibler Distance Information Gain: Number of bits of describing X as Y Not a distance: not symmetric, no triangle inequality Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Entropies ... Mutual Information Between Two Random Variables: ( X, Y ) ∼ p ( x, y ) X ∼ p ( x ) & Y ∼ p ( y ) Mutual Information: I ( X ; Y ) = D ( P ( x, y ) || P ( x ) P ( y )) p ( x,y ) � I ( X ; Y ) = p ( x, y ) log 2 p ( x ) p ( y ) ( x,y ) ∈X×Y Properties: (1) I ( X ; Y ) ≥ 0 (2) I ( X ; Y ) = I ( Y ; X ) (3) I ( X ; Y ) = H ( X ) − H ( X | Y ) (4) I ( X ; Y ) = H ( X ) + H ( Y ) − H ( X, Y ) (5) I ( X ; X ) = H ( X ) Interpretation: Information one variable has about another Information shared between two variables Measure of dependence between two variables Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
� Entropies ... Example: Dining on campus ... Mutual information: Reduction in uncertainty about lunch, given dinner: I ( D ; L ) = H ( L ) − H ( L | D ) I ( D ; L ) = H ( 3 4 ) − 2 3 ≈ 0 . 1 bits Reduction in uncertainty about dinner, given lunch: I ( D ; L ) = H ( D ) − H ( D | L ) I ( D ; L ) = 1 − H ( 2 3 ) ≈ 0 . 1 bits Shared information between what’s server for dinner & lunch. Further inquiry: Hidden variable = leftovers Vegetable served with dinner appears in lunch’s casserole! Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
� Entropies ... Example: Dining on campus ... How different are dinner and lunch? Information Gain? But they don’t share event space: D ∈ { P, M } & L ∈ { C, H } Turns out the Pizza was vegetarian The events are common: Pizza and Casserole: Vegetarian V ∈ { Veg , Non } Meat w/Veg and Hot Dog: Not Pr( D = v ) � D ( D || L ) = Pr( D = v ) log 2 Pr( L = v ) v ∈ V D ( D || L ) = 0 . 23 bits Lecture 12: Natural Computation and Self-Organization, Physics 250 (Spring 2005); Jim Crutchfield
Recommend
More recommend