KL divergence or relative entropy Two pmfs p (x) and q (x) : p (x) - PDF document

Aug 30, 2022 •120 likes •207 views

KL divergence or relative entropy Two pmfs p (x) and q (x) : p (x) log p (x) (5) D( p q ) = q (x) x X q = 0 , otherwise p log p Say 0 log 0 0 = . log p (X) D( p q ) = E p (6) q (X) I(X ; Y) = D( p (x, y) p (x) p

KL divergence or relative entropy Two pmfs p (x) and q (x) : p (x) log p (x) � (5) D( p � q ) = q (x) x ∈X q = 0 , otherwise p log p Say 0 log 0 0 = ∞ . � � log p (X) D( p � q ) = E p (6) q (X) I(X ; Y) = D( p (x, y) � p (x) p (y)) (7) 12
• Measure of how different two proba- bility distributions are • The average number of bits that are wasted by encoding events from a distribution p with a code based on a not-quite-right distribution q . • D( p � q ) ≥ 0 ; D( p � q ) = 0 iff p = q • Not a metric: not commutative, doesn’t satisfy triangle equality 13
[Slide on D(p � q) vs D(q � p) ] 14
Cross entropy • Entropy = uncertainty • Lower entropy = determining efficient codes = knowing the structure of the language = good measure of model quality • Entropy = measure of surprise • How surprised we are when w follows h is pointwise entropy: H(w | h) = − log 2 p (w | h) p (w | h) = 1 ? p (w | h) = 0 • Total surprise: n � H total = − log 2 m (w j | w 1 , w 2 , . . . , w j − 1 ) j = 1 = − log 2 m (w 1 , w 2 , . . . , w n ) 15
Formalizing through cross-entropy • Our model of language is q (x) . How good a model is it? • Idea: use D( p � q ) , where p is the correct model. • Problem: we don’t know p . • But we know roughly what it is like from a corpus • Cross entropy: H(X, q ) = H(X) + D( p � q ) (8) � = − p (x) log q (x) x 1 = E p ( log q (x)) (9) 16
• Cross entropy of a language L = (X i ) ∼ p (x) according to a model m : 1 � H(L, m ) = − lim p (x 1 n ) log m (x 1 n ) n →∞ n x 1 n • If the language is ‘nice’: 1 H(L, m ) = − lim n log m (x 1 n ) (10) n →∞ I.e., it’s just our average surprise for large n : H(L, m ) ≈ − 1 n log m (x 1 n ) (11) • Since H(L) is fixed if unknown, minimiz- ing cross-entropy is equivalent to minimiz- ing D( p � m ) • Providing: independent test data; assume L = (X i ) is stationary [does’t change over time], ergodic [doesn’t get stuck] 17
Entropy of English text 27 letter alphabet Model Cross entropy (bits) zeroth order 4.76 ( log 27 ) first order 4.03 second order 2.8 Shannon’s experiment 1.3 (1.34) 18
Perplexity perplexity (x 1 n , m ) = 2 H(x 1 n , m ) = m (x 1 n ) − 1 n 19

Recommend

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of a discrete random variable. Properties: H(x) >= 0 Entropy Entropy Lesser the probability for an event, larger the entropy.

474 views • 21 slides

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Entropy Entropy Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1 Entropy Entropy and Information Joint Entropy Frank Keller Conditional Entropy School of Informatics University of Edinburgh

81 views • 4 slides

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute of Communication Engineering National Taipei University Chapter Outline Chap. 2 Entropy, Relative Entropy, and Mutual Information 2.1 Entropy 2.2

753 views • 51 slides

Infotheory for Statistics and Learning Lecture 1 Entropy Relative entropy Mutual

Infotheory for Statistics and Learning Lecture 1 Entropy Relative entropy Mutual information f -divergence Mikael Skoglund 1/16 Entropy Over ( R , B ) , consider a discrete RV X with all probability in a countable set X B ,

538 views • 8 slides

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Outline Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the Technion) Huffman coding Arithmetic coding Lempel-Ziv coding 2 Entropy Definitions Alphabet : A finite set containing at least

403 views • 10 slides

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Introduction to Information Retrieval Entropy: a basic introduction 1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less compressible High entropy = high randomness/low compressibility Low entropy =

236 views • 19 slides

Relative Entropy in CFT (Based on a joint paper with R. Longo arxiv 1712.07283 ) Feng Xu Dept of

Relative Entropy in CFT (Based on a joint paper with R. Longo arxiv 1712.07283 ) Feng Xu Dept of Math UCR outline 1 Motivation and Main Results Feng Xu (UCR) Relative Entropy in CFT 2 / 102 outline 1 Motivation and Main Results 2 Entropy

1.23k views • 95 slides

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically - defined thermodynamic quantity that helps to account for the flow of energy through a thermodynamic process. 2 What is

467 views • 13 slides

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free Expansion of an Ideal Gas Microscopic Interpretation of Entropy Entropy and the Second Law of Thermodynamics

466 views • 11 slides

Entropy and The Second Law of Thermodynamics Entropy (S)

Entropy and The Second Law of Thermodynamics Entropy (S) Entropy is o8en related to disorder but not strictly correct Entropy is a measure

353 views • 8 slides

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for describing low-level computation on modern CPUs (c) 2009 Entropy Wave Inc Motivation (c) 2009 Entropy Wave Inc Motivation Want maintainable assembly

611 views • 28 slides

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem - Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem - Anna Giordano Bruno - University of Udine

212 views • 17 slides

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

From Sentence to Text From Sentence to Text Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Experiment 2: Entropy in Context Experiment 2: Entropy in Context Experiment 3: Entropy out of Context Experiment

231 views • 7 slides

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual information: Cover & Thomas (CT) 2.15 Important inequalities: CT2.68, 2.10 Mikael Skoglund, Information Theory 1/26 Information Theory

160 views • 13 slides

Relative entropy in diffusive relaxation C. Lattanzio 1 In collaboration with A.E. Tzavaras 2 14th

Relative entropy in diffusive relaxation Relative entropy in diffusive relaxation C. Lattanzio 1 In collaboration with A.E. Tzavaras 2 14th International Conference on Hyperbolic Problems: Theory, Numerics, Applications June 25 29, 2012,

428 views • 26 slides

Relative entropy for the finite volume approximation of hyperbolic systems H el` ene Mathis

Relative entropy for the finite volume approximation of hyperbolic systems H el` ene Mathis Supported by LRC Manon, CEA, UPMC Paris 6 and Universit e de Nantes, LMJL HYP 2012 H el` ene Mathis (LMJL) Relative entropy for hyperbolic

350 views • 24 slides

Toward software engineering in practice Michael Hilton School of Computer Science 17-214 1

Toward software engineering in practice Michael Hilton School of Computer Science 17-214 1 Learning Goals Introduction to Software Engineering Discussion of Test Driven Development 17-214 2 Introduction 17-214 3 SOFTWARE IS

361 views • 34 slides

Black-hole dynamics Sean A. Hayward Paris, 22nd November 2006 1. Classical theory of black holes

Black-hole dynamics Sean A. Hayward Paris, 22nd November 2006 1. Classical theory of black holes 2. Dynamical black holes: trapping horizons 3. Basic laws: trapping, signature, area, topology 4. Conservation of energy 5. Conservation of

495 views • 12 slides

Rental Prices for Apartments in the province of Zurich (cont.) Assignment 3 for Spatial

Non-Spatial Spatial Rental Prices for Apartments in the province of Zurich (cont.) Assignment 3 for Spatial Statistics (STAT 946) Adrian Waddell University of Waterloo December 2, 2008 Adrian Waddell (University of Waterloo) GRASS December

548 views • 15 slides

Data Assimilation for Fuel Moisture in WRF-SFIRE: Method and Implementation M. Vejmelka 1 , 2 , A.

Data Assimilation for Fuel Moisture in WRF-SFIRE: Method and Implementation M. Vejmelka 1 , 2 , A. Kochanski 3 , J. Beezley 4 , J. Mandel 1 1 Department of Mathematical and Statistical Sciences University of Colorado - Denver 2 Institute of

621 views • 21 slides

Application of Information Theory, Introduction Iftach Haitner Tel Aviv University. October 28,

Application of Information Theory, Introduction Iftach Haitner Tel Aviv University. October 28, 2014 Iftach Haitner (TAU) Application of Information Theory, Intro October 28, 2014 1 / 12 Section 1 Administration Iftach Haitner (TAU)

590 views • 27 slides

Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence

Theoretic results Simulation study Fast adaptive estimation of log-additive exponential models in Kullback-Leibler divergence Colloque Jeunes Probabilistes et Statisticiens Richard Fischer EDF R&D MRI, CERMICS, LAMA Supervisors: Cristina

435 views • 24 slides

Mathematical Foundations Foundations of Statistical Natural Language Processing, chapter2

Mathematical Foundations Foundations of Statistical Natural Language Processing, chapter2 Presented by Jen-Wei Kuo CSIE, NTNU rogerkuo@csie.ntnu.edu.tw Reference A First Course in Probability -Sheldon Ross Probability

498 views • 26 slides

Bottom-up Cell Suppression that Preserves the Missing-at-random Condition Yoshitaka Kameya and

Bottom-up Cell Suppression that Preserves the Missing-at-random Condition Yoshitaka Kameya and Kentaro Hayashi Meijo University TrustBus-16 1 Outline Background Our proposal Experiments TrustBus-16 2 Outline Background

593 views • 41 slides