Information Theory Don Fallis
Information in the Wild
Intentional Information Transfer
Data Storage
Measuring Information
Surprise!
Inversely Related to Probability • The lower the probability of event A, the more information you get by learning A. • The higher the probability of event A, the less information you get by learning A. • So, 1/p(A) is a plausible measure of the information you get by learning A.
Measuring Information 1 2 • S(HEADS) = 1/p(HEADS) = 1/0.5 = 2 1 2 3 4 • S(‘1’) = 1/p(‘1’) = 1/0.25 = 4 1 2 3 4 5 6 7 8 • S(‘2’) = 1/p(‘2’) = 1/0.125 = 8
Measuring Information 1 2 1 1 5 2 2 6 3 3 7 4 4 8 • 2 + 4 ≠ 8 • Log 2 (2) + log 2 (4) = 1 + 2 = 3 = log 2 (8)
Binary Search
Surprise • Surprise of a Fair Coin coming up Heads • S (FC = HEADS) = log 2 ( 1/(1/2) ) = log 2 (2) = 1 bit • Surprise of LLR being at the Left shrub at first time step • S (X 1 = LEFT) = log 2 ( 1/(1/3) ) = log 2 (3) = 1.58 bits • Surprise of a Fire Alarm going off • S (FA = ALARM) = log 2 ( 1/(1/100) ) = log 2 (100) = 6.644 bits
Bits versus Binary Digits
Entropy • Entropy is Average Surprise • Note that this another example of expected value . • Entropy of a Fair Coin • H (FC) = 1/2*log 2 (2) + 1/2*log 2 (2) • H (FC) = 1/2*1 + 1/2*1 = 1 • Entropy of Robot Location at first time step • H (X 1 ) = 1/3*log 2 (3) + 1/3*log 2 (3) + 1/3*log 2 (3) • H (X 1 ) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58 • Entropy of a Fire Alarm • H (FA) = 0.01*log 2 (100) + 0.99*log 2 (1.01) • H (FA) = 0.01*6.644 + 0.99*0.014 = 0.081
Uniform Maximizes Entropy
Amount of Information Transmitted
Noise
Information Channel
Binary Symmetric Channel
Probabilistic Graphical Model • 𝑞 𝑠 | 𝑡 • 𝑞 𝑡 𝛀 SR 𝚾 S R 0 R 1 S 0 q S 0 1-p p S 1 1-q S 1 p 1-p
Mutual Information
Worst-Case Scenario (Independent)
Best-Case Scenario (Perfectly Correlated) • 𝐼 𝑌 = 𝐼 𝑍 = 𝑁𝐽 𝑌 & 𝑍
Everything In Between • 𝑁𝐽 𝑌 & 𝑍 = 𝐼 𝑌 + 𝐼 𝑍 − 𝐼(𝑌 & 𝑍)
Measuring Mutual Information • Mutual Information is Expected Reduction in Uncertainty • Note that this another example of expected value . • Suppose that you see a Yellow flash … • Your credences shift from (1/3, 1/3, 1/3) to (1/2, 1/2, 0) • The entropy of your credences shifts from 1.58 to 1 • So, there is a reduction in entropy of 0.58 • Suppose that you see a White flash … • Your credences shift from (1/3, 1/3, 1/3) to (0, 0, 1) • The entropy of your credences shifts from 1.58 to 0 • So, there is a reduction in entropy of 1.58 • Take a Weighted Average … • The probability of a Yellow flash is 2/3 • The probability of a White flash is 1/3 • So, the expected reduction in entropy is 2/3*0.58 + 1/3*1.58 = 0.92
Firefly Entropy • H (H) = 1/3*log 2 (3) + 1/3*log 2 (3) + 1/3*log 2 (3) • H (H) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58 • H (E) = 2/3*log 2 (1.5) + 1/3*log 2 (3) • H (E) = 2/3*0.58 + 1/3*1.58 = 0.92 • 𝑞 ℎ & 𝑓 , 𝑞 ℎ , and 𝑞(𝑓) H↓ E→ YELLOW WHITE total H GOOD 1/3 0 1/3 BAD 1/3 0 1/3 UGLY 0 1/3 1/3 total E 2/3 1/3
More Firefly Entropy • H (H&E) = 1/3*log 2 (3) + 0*log 2 (0) + 1/3*log 2 (3) + 0*log 2 (0) + 0*log 2 (0) + 1/3*log 2 (3) • H (H&E) = 1/3*1.58 + 0*(-∞) + 1/3*1.58 + 0*(-∞) + 0*(-∞) + 1/3*1.58 = 1.58 • 𝑞 ℎ & 𝑓 , 𝑞 ℎ , and 𝑞(𝑓) H↓ E→ YELLOW WHITE total H GOOD 1/3 0 1/3 BAD 1/3 0 1/3 UGLY 0 1/3 1/3 total E 2/3 1/3
Firefly Mutual Information • MI (H&E) = H (H) + H (E) – H (H&E) • MI (H&E) = 1.58 + 0.92 – 1.58 = 0.92 • 𝑞 ℎ & 𝑓 , 𝑞 ℎ , and 𝑞(𝑓) H↓ E→ YELLOW WHITE total H GOOD 1/3 0 1/3 BAD 1/3 0 1/3 UGLY 0 1/3 1/3 total E 2/3 1/3
Robot Localization #1 • H (X 1 ) = 1/3*log 2 (3) + 1/3*log 2 (3) + 1/3*log 2 (3) • H (X 1 ) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58 • H (X 2 ) = 1/12*log 2 (12) + 1/3*log 2 (3) + 7/12*log 2 (1.71) • H (X 2 ) = 1/12*3.58 + 1/3*1.58 + 7/12*0.78 = 1.28 • 𝑞 𝑦 1 & 𝑦 2 , 𝑞 𝑦 1 , and 𝑞 𝑦 2 X 1 ↓ X 2 → left middle right total X 1 left 1/12 1/4 0 1/3 middle 0 1/12 1/4 1/3 right 0 0 1/3 1/3 total X 2 1/12 1/3 7/12
Robot Localization #1 • H (X 1 &X 2 ) = 1/12*log 2 (12) + 1/4*log 2 (4) + 1/12*log 2 (12) + 1/4*log 2 (4) + 1/3*log 2 (3) • H (X 1 &X 2 ) = 1/12*3.58 + 1/4*2 + 1/12*3.58 + 1/4*2 + 1/3*1.58 = 2.13 • 𝑞 𝑦 1 & 𝑦 2 , 𝑞 𝑦 1 , and 𝑞 𝑦 2 X 1 ↓ X 2 → left middle right total X 1 left 1/12 1/4 0 1/3 middle 0 1/12 1/4 1/3 right 0 0 1/3 1/3 total X 2 1/12 1/3 7/12
Robot Localization #1 • MI (X 1 &X 2 ) = H (X 1 ) + H (X 2 ) – H (X 1 &X 2 ) • MI (X 1 &X 2 ) = 1.58 + 1.28 – 2.13 = 0.74 • 𝑞 𝑦 1 & 𝑦 2 , 𝑞 𝑦 1 , and 𝑞 𝑦 2 X 1 ↓ X 2 → left middle right total X 1 left 1/12 1/4 0 1/3 middle 0 1/12 1/4 1/3 right 0 0 1/3 1/3 total X 2 1/12 1/3 7/12
Robot Localization #1 • H (X 1 ) = 1/3*log 2 (3) + 1/3*log 2 (3) + 1/3*log 2 (3) • H (X 1 ) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58 • H (O 1 ) = 2/3*log 2 (1.5) + 1/3*log 2 (3) • H (O 1 ) = 2/3*0.58 + 1/3*1.58 = 0.92 • 𝑞 𝑦 1 & 𝑝 1 , 𝑞 𝑦 1 , and 𝑞 𝑝 1 X 1 ↓ O 1 → hot cold total X 1 left 1/3 0 1/3 middle 0 1/3 1/3 right 1/3 0 1/3 total O 1 2/3 1/3
Robot Localization #1 • H (X 1 &O 1 ) = 1/3*log 2 (3) + 0*log 2 (0) + 0*log 2 (0) + 1/3*log 2 (3) + 1/3*log 2 (3) + 0*log 2 (0) • H (X 1 &O 1 ) = 1/3*1.58 + 0*(-∞) + 0*(-∞) + 1/3*1.58 + 1/3*1.58 + 0*(-∞) = 1.58 • 𝑞 𝑦 1 & 𝑝 1 , 𝑞 𝑦 1 , and 𝑞 𝑝 1 X 1 ↓ O 1 → hot cold total X 1 left 1/3 0 1/3 middle 0 1/3 1/3 right 1/3 0 1/3 total O 1 2/3 1/3
Robot Localization #1 • MI (X 1 &O 1 ) = H (X 1 ) + H (O 1 ) – H (X 1 &O 1 ) • MI (X 1 &O 1 ) = 1.58 + 0.92 – 1.58 = 0.92 • 𝑞 𝑦 1 & 𝑝 1 , 𝑞 𝑦 1 , and 𝑞 𝑝 1 X 1 ↓ O 1 → hot cold total X 1 left 1/3 0 1/3 middle 0 1/3 1/3 right 1/3 0 1/3 total O 1 2/3 1/3
Recommend
More recommend