An Introduction to Information Theory Carlton Downey November 12, - PowerPoint PPT Presentation

Motivation Information Entropy Compressing Information An Introduction to Information Theory Carlton Downey November 12, 2013

Motivation Information Entropy Compressing Information I NTRODUCTION ◮ Today’s recitation will be an introduction to Information Theory ◮ Information theory studies the quantification of Information ◮ Compression ◮ Transmission ◮ Error Correction ◮ Gambling ◮ Founded by Claude Shannon in 1948 by his classic paper “A Mathematical Theory of Communication” ◮ It is an area of mathematics which I think is particularly elegant

Motivation Information Entropy Compressing Information O UTLINE Motivation Information Entropy Marginal Entropy Joint Entropy Conditional Entropy Mutual Information Compressing Information Prefix Codes KL Divergence

Motivation Information Entropy Compressing Information M OTIVATION : C ASINO ◮ You’re at a casino ◮ You can bet on coins, dice, or roulette ◮ Coins = 2 possible outcomes. Pays 2:1 ◮ Dice = 6 possible outcomes. Pays 6:1 ◮ roulette = 36 possible outcomes. Pays 36:1 ◮ Suppose you can predict the outcome of a single coin toss/dice roll/roulette spin. ◮ Which would you choose?

Motivation Information Entropy Compressing Information M OTIVATION : C ASINO ◮ You’re at a casino ◮ You can bet on coins, dice, or roulette ◮ Coins = 2 possible outcomes. Pays 2:1 ◮ Dice = 6 possible outcomes. Pays 6:1 ◮ roulette = 36 possible outcomes. Pays 36:1 ◮ Suppose you can predict the outcome of a single coin toss/dice roll/roulette spin. ◮ Which would you choose? ◮ Roulette. But why? these are all fair games

Motivation Information Entropy Compressing Information M OTIVATION : C ASINO ◮ You’re at a casino ◮ You can bet on coins, dice, or roulette ◮ Coins = 2 possible outcomes. Pays 2:1 ◮ Dice = 6 possible outcomes. Pays 6:1 ◮ roulette = 36 possible outcomes. Pays 36:1 ◮ Suppose you can predict the outcome of a single coin toss/dice roll/roulette spin. ◮ Which would you choose? ◮ Roulette. But why? these are all fair games ◮ Answer: Roulette provides us with the most Information

Motivation Information Entropy Compressing Information M OTIVATION : C OIN T OSS ◮ Consider two coins: ◮ Fair coin C F with P ( H ) = 0 . 5 , P ( T ) = 0 . 5 ◮ Bent coin C B with P ( H ) = 0 . 99 , P ( T ) = 0 . 01 ◮ Suppose we flip both coins, and they both land heads ◮ Intuitively we are more “surprised” or “Informed” by first outcome. ◮ We know C B is almost certain to land heads, so the knowledge that it lands heads provides us with very little information.

Motivation Information Entropy Compressing Information M OTIVATION : C OMPRESSION ◮ Suppose we observe a sequence of events: ◮ Coin tosses ◮ Words in a language ◮ notes in a song ◮ etc. ◮ We want to record the sequence of events in the smallest possible space. ◮ In other words we want the shortest representation which preserves all information. ◮ Another way to think about this: How much information does the sequence of events actually contain?

Motivation Information Entropy Compressing Information M OTIVATION : C OMPRESSION To be concrete, consider the problem of recording coin tosses in unary. T , T , T , T , H Approach 1: H T 0 00 00 , 00 , 00 , 00 , 0 We used 9 characters

Motivation Information Entropy Compressing Information M OTIVATION : C OMPRESSION To be concrete, consider the problem of recording coin tosses in unary. T , T , T , T , H Approach 2: H T 00 0 0 , 0 , 0 , 0 , 00 We used 6 characters

Motivation Information Entropy Compressing Information M OTIVATION : C OMPRESSION ◮ Frequently occuring events should have short encodings ◮ We see this in english with words such as “a”, “the”, “and”, etc. ◮ We want to maximise the information-per-character ◮ seeing common events provides little information ◮ seeing uncommon events provides a lot of information

Motivation Information Entropy Compressing Information I NFORMATION ◮ Let X be a random variable with distribution p ( X ) . ◮ We want to quantify the information provided by each possible outcome. ◮ Specifically we want a function which maps the probability of an event p ( x ) to the information I ( x ) ◮ Our metric I ( x ) should have the following properties: ◮ I ( x i ) ≥ 0 ∀ i . ◮ I ( x 1 ) > I ( x 2 ) if p ( x 1 ) < p ( x 2 ) ◮ I ( x 1 , x 2 ) = I ( x 1 ) + I ( x 2 )

Motivation Information Entropy Compressing Information I NFORMATION I ( x ) = f ( p ( x )) ◮ We want f () such that I ( x 1 , x 2 ) = I ( x 1 ) + I ( x 2 ) ◮ We know p ( x 1 , x 2 ) = p ( x 1 ) p ( x 2 ) ◮ The only function with this property is log () : log ( ab ) = log ( a ) + log ( b ) ◮ Hence we define: I ( X ) = log ( 1 p ( x ))

Motivation Information Entropy Compressing Information I NFORMATION : C OIN h t Fair Coin: 0.5 0.5 log ( 1 I ( h ) = 0 . 5 ) = log ( 2 ) = 1 log ( 1 I ( t ) = 0 . 5 ) = log ( 2 ) = 1

Motivation Information Entropy Compressing Information I NFORMATION : C OIN h t Bent Coin: 0.25 0.75 I ( h ) = log ( 1 0 . 25 ) = log ( 4 ) = 2 I ( t ) = log ( 1 0 . 75 ) = log ( 1 . 33 ) = 0 . 42

Motivation Information Entropy Compressing Information I NFORMATION : C OIN h t Really Bent Coin: 0.01 0.99 I ( h ) = log ( 1 0 . 01 ) = log ( 100 ) = 6 . 65 I ( t ) = log ( 1 0 . 99 ) = log ( 1 . 01 ) = 0 . 01

Motivation Information Entropy Compressing Information I NFORMATION : T WO E VENTS Question: How much information do we get from observing two events? 1 I ( x 1 , x 2 ) = log ( p ( x 1 , x 2 )) 1 = log ( p ( x 1 ) p ( x 2 )) 1 1 = log ( p ( x 2 )) p ( x 1 ) 1 1 = log ( p ( x 1 )) + log ( p ( x 2 )) = I ( x 1 ) + I ( x 2 ) Answer: Information sums!

Motivation Information Entropy Compressing Information I NFORMATION IS A DDITIVE 1 ◮ I(k fair coin tosses) = log 1 / 2 k = k bits ◮ So: ◮ Random word from a 100,000 word vocabulary: I(word) = log ( 100 , 000 ) = 16.61 bits ◮ A 1000 word document from same source: I(documents) = 16,610 bits ◮ A 480 pixel, 16-greyscale video picture: I(picture) = 307 , 200 × log ( 16 ) = 1,228,800 bits ◮ A picture is worth (a lot more than) 1000 words! ◮ In reality this is a gross overestimate

Motivation Information Entropy Compressing Information I NFORMATION : T WO C OINS x h t Bent Coin: p(x) 0.25 0.75 I(x) 2 0.42 I ( hh ) = I ( h ) + I ( h ) = 4 I ( ht ) = I ( h ) + I ( t ) = 2 . 42 I ( th ) = I ( t ) + I ( h ) = 2 . 42 I ( th ) = I ( t ) + I ( t ) = 0 . 84

Motivation Information Entropy Compressing Information I NFORMATION : T WO C OINS hh ht th tt Bent Coin Twice: 0.0625 0.1875 0.1875 0.5625 1 I ( hh ) = log ( 0 . 0625 ) = log ( 4 ) = 4 1 I ( ht ) = log ( 0 . 1875 ) = log ( 4 ) = 2 . 42 1 I ( th ) = log ( 0 . 1875 ) = log ( 4 ) = 2 . 42 1 I ( tt ) = log ( 0 . 5625 ) = log ( 4 ) = 0 . 84

Motivation Information Entropy Compressing Information E NTROPY ◮ Suppose we have a sequence of observations of a random variable X . ◮ A natural question to ask is what is the average amount of information per observation. ◮ This quantitity is called the Entropy and denoted H ( X ) 1 H ( X ) = E [ I ( X )] = E [ log ( p ( X ))]

Motivation Information Entropy Compressing Information E NTROPY ◮ Information is associated with an event - heads, tails, etc. ◮ Entropy is associated with a distribution over events - p(x).

Motivation Information Entropy Compressing Information E NTROPY : C OIN x h t Fair Coin: p(x) 0.5 0.5 I(x) 1 1 H ( X ) = E [ I ( X )] � = p ( x i ) I ( X ) i = p ( h ) I ( h ) + p ( t ) I ( t ) = 0 . 5 × 1 + 0 . 5 × 1 = 1

Motivation Information Entropy Compressing Information E NTROPY : C OIN x h t Bent Coin: p(x) 0.25 0.75 I(x) 2 0.42 H ( X ) = E [ I ( X )] � = p ( x i ) I ( X ) i = p ( h ) I ( h ) + p ( t ) I ( t ) = 0 . 25 × 2 + 0 . 75 × 0 . 42 = 0 . 85

Motivation Information Entropy Compressing Information E NTROPY : C OIN x h t Very Bent Coin: p(x) 0.01 0.99 I(x) 6.65 0.01 H ( X ) = E [ I ( X )] � = p ( x i ) I ( X ) i = p ( h ) I ( h ) + p ( t ) I ( t ) = 0 . 01 × 6 . 65 + 0 . 99 × 0 . 01 = 0 . 08

Motivation Information Entropy Compressing Information E NTROPY : A LL COINS

Motivation Information Entropy Compressing Information E NTROPY : A LL COINS H ( P ) = p log 1 1 p + ( 1 − p ) log 1 − p

Motivation Information Entropy Compressing Information A LTERNATIVE E XPLANATIONS OF E NTROPY p i log 1 � H ( S ) = p i i ◮ Average amount of information provided per event ◮ Average amount of surprise when observing a event ◮ Uncertainty an observer has before seeing the event ◮ Average number of bits needed to communicate each event

An Introduction to Information Theory Carlton Downey November 12, - PowerPoint PPT Presentation

Motivation Information Entropy Compressing Information An Introduction to Information Theory Carlton Downey November 12, 2013 Motivation Information Entropy Compressing Information I NTRODUCTION Todays recitation will be an

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

General motivations Model theory Recursion theory Lambda calculus Set theory

Introduction to game theory Introduction to game theory Jie Gao Computer Science Department

Information Theory project Lo Bordy 29 mai 2017 Lo Bordy Information Theory project Global

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual

Theory of Computer Science May 6, 2020 E1. Complexity Theory: Motivation and Introduction

Lectures 34: Consumer Theory Alexander Wolitzky MIT 14.121 1 Consumer Theory Consumer theory

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

More Transport, Please! More Transport, Please! Kory Draughn June 9-12, 2020 Software Developer

Encryption at Rest in ZFS Tom Caputi tcaputi@datto.com Overview of Encryption Implementation 2

Huffman Trees To save space when storing it. Greedy Algorithm for Data Compression To save

Recent Developments in Video Compression Standardization CVPR CLIC Workshop, Salt Lake City,

An Empirical Evaluation of Simple DTD-Conscious Compression Techniques James Cheney Database

Randomness and quantum computa/on Computability and the BSS

Topological Data Analysis - I Afra Zomorodian Department of Computer Science Dartmouth College

Geometry driven collapses for simplifying Cech complexes Dominique Attali ( * ) and Andr