a quantitative measure of relevance based on kelly
play

A Quantitative Measure of Relevance Based on Kelly Gambling Theory - PowerPoint PPT Presentation

A Quantitative Measure of Relevance Based on Kelly Gambling Theory Mathias Winther Madsen Institute for Logic, Language, and Computation University of Amsterdam PLAN Why? How? Examples Why? Why? How? Why not use Shannon


  1. A Quantitative Measure of Relevance Based on Kelly Gambling Theory Mathias Winther Madsen Institute for Logic, Language, and Computation University of Amsterdam

  2. PLAN ● Why? ● How? ● Examples

  3. Why?

  4. Why?

  5. How?

  6. Why not use Shannon information? 1 H ( X ) == E log —————— Pr( X == x ) Claude Shannon (1916 – 2001)

  7. Why not use Shannon information? Information Prior Posterior === — Content Uncertainty Uncertainty (cf. Klir 2008; Shannon 1948)

  8. Why not use Shannon information? Pr( X == 1) == 0.15 Pr( X == 2) == 0.19 What is the Pr( X == 3) == 0.23 value of X? Pr( X == 4) == 0.21 Pr( X == 5) == 0.22 1 H ( X ) == E log —————— == 2.31 Pr( X == x )

  9. Why not use Shannon information? Pr( X == 1) == 0.15 0 Is X == 2? 0 Pr( X == 2) == 0.19 1 Is X == 3? 1 0 Pr( X == 3) == 0.23 Is X in {4,5}? 1 Pr( X == 4) == 0.21 0 Is X == 5? 1 Pr( X == 5) == 0.22 Expected number == 2.34 of questions:

  10. What color are my socks? H ( p ) == – ∑ p log p == 6.53 bits of entropy.

  11. How?

  12. Why not use value-of-information? $ ! ? ! $ $ $ $$ Value-of- Posterior Prior — = = = Information Expectation Expectation

  13. Why not use value-of-information? Rules: ● Your capital can be distributed freely ● Bets on the actual outcome are returned twofold ● Bets on all other outcomes are lost

  14. Why not use value-of-information? Optimal Strategy: Expected payoff Degenerate Gambling (Everything (Everything on Heads) on Tails)

  15. Why not use value-of-information? Capital Probability Rounds Rate of return ( R )

  16. Why not use value-of-information? Probability Rate of return: == Capital at time i + 1 R i Capital at time i Long-run behavior: E [ R 1 · R 2 · R 3 · · · R n ] Rate of return ( R )

  17. Why not use value-of-information? Probability Rate of return: == Capital at time i + 1 R i Capital at time i Long-run behavior: E [ R 1 · R 2 · R 3 · · · R n ] Converges to 0 Rate of return ( R ) in probability as n → ∞

  18. Optimal reinvestment Daniel Bernoulli John Larry Kelly, Jr. (1700 – 1782) (1923 – 1965)

  19. Optimal reinvestment Doubling rate: W i == log Capital at time i + 1 Capital at time i (so R = 2 W )

  20. Optimal reinvestment Doubling rate: Long-run behavior: W i == log Capital at time i + 1 E [ R 1 · R 2 · R 3 · · · R n ] Capital at time i == E [2 W 1 + W 2 + W 3 + · · · + W n ] (so R = 2 W ) == 2 E [ W 1 + W 2 + W 3 + · · · + W n ] → 2 nE [ W ] for n → ∞

  21. Optimal reinvestment Logarithmic expectation E [ W ] == ∑ p log bo is maximized by propor- tional gambling ( b * == p ). Arithmetic expectation E [ R ] == ∑ pbo is maximized by degenerate gambling

  22. Measuring relevant information $ ! ? ! $ $ $ $$ Amount of Posterior Prior relevant === expected — expected information doubling rate doubling rate

  23. Measuring relevant information Definition (Relevant Information): For an agent with utility function u , the amount of relevant information contained in the message Y == y is K ( y ) == ∑ max s ∑ Pr( x | y ) log u ( s , x ) – max s ∑ Pr( x ) log u ( s , x ) Posterior optimal Prior optimal doubling rate doubling rate

  24. Measuring relevant information K ( y ) == ∑ max s ∑ Pr( x | y ) log u ( s , x ) – max s ∑ Pr( x ) log u ( s , x ) ● Expected relevant information is non-negative . ● Relevant information equals the maximal fraction of future gains you can pay for a piece of information without loss. ● When u has the form u ( s , x ) == v ( x ) s( x ) for some non-negative function v , relevant information equals Shannon information .

  25. Example: Code-breaking

  26. Example: Code-breaking ? ? ? ? Entropy: H = 4 Accumulated information: I ( X ; Y ) == 0

  27. Example: Code-breaking 1 ? ? ? 1 bit! Entropy: H = 3 Accumulated information: I ( X ; Y ) == 1

  28. Example: Code-breaking 1 0 ? ? 1 bit! Entropy: H = 2 Accumulated information: I ( X ; Y ) == 2

  29. Example: Code-breaking 1 0 1 ? 1 bit! Entropy: H = 1 Accumulated information: I ( X ; Y ) == 3

  30. Example: Code-breaking 1 0 1 1 1 bit! Entropy: H = 0 Accumulated information: I ( X ; Y ) == 4

  31. Example: Code-breaking 1 0 1 1 1 bit 1 bit 1 bit 1 bit Entropy: H = 0 Accumulated information: I ( X ; Y ) == 4

  32. Example: Code-breaking Rules: ? ● You can invest a fraction f of your capital in the guessing game ? ● If you guess the correct code, you get your investment back 16-fold: ? u == 1 – f + 16 f ? ● Otherwise, you lose it: u == 1 – f 15 1 W ( f ) == —— log(1 – f ) + —— log(1 – f + 16 f ) 16 16

  33. Example: Code-breaking ? ? ? ? Optimal strategy: f * == 0 Optimal doubling rate: W ( f *) == 0.00 15 1 W ( f ) == —— log(1 – f ) + —— log(1 – f + 16 f ) 16 16

  34. Example: Code-breaking 1 ? ? ? 0.04 bits Optimal strategy: f * == 1/15 Optimal doubling rate: W ( f *) == 0.04 7 1 W ( f ) == —— log(1 – f ) + —— log(1 – f + 16 f ) 8 8

  35. Example: Code-breaking 1 0 ? ? 0.22 bits Optimal strategy: f * == 3/15 Optimal doubling rate: W ( f *) == 0.26 3 1 W ( f ) == —— log(1 – f ) + —— log(1 – f + 16 f ) 4 4

  36. Example: Code-breaking 1 0 1 ? 0.79 bits Optimal strategy: f * == 7/15 Optimal doubling rate: W ( f *) == 1.05 1 1 W ( f ) == —— log(1 – f ) + —— log(1 – f + 16 f ) 2 2

  37. Example: Code-breaking 1 0 1 1 2.95 bits Optimal strategy: f * == 1 Optimal doubling rate: W ( f *) == 4.00 0 1 W ( f ) == —— log(1 – f ) + —— log(1 – f + 16 f ) 1 1

  38. Example: Code-breaking ? ? ? ? 1.00 1.00 1.00 1.00 Raw information (drop in entropy ) Relevant information 0.04 0.22 0.79 2.95 (increase in doubling rate )

  39. Example: Randomization

  40. Example: Randomization def choose(): if flip(): if flip(): return ROCK 1/3, 1/3, 1/3 else: return PAPER 1/2, 1/4, 1/4 else: return SCISSORS

  41. Example: Randomization Rules: 1 ● You (1) and the adversary (2) both bet $1 2 ● You move first ● The winner takes the whole pool W ( p ) == log min { p 1 + 2 p 2 , p 2 + 2 p 3 , p 3 + 2 p 1 }

  42. Example: Randomization Best accessible strategy: p * == (1, 0, 0) Doubling rate: W ( p *) == –∞ W ( p ) == log min { p 1 + 2 p 2 , p 2 + 2 p 3 , p 3 + 2 p 1 }

  43. Example: Randomization Best accessible strategy: p * == (1/2, 1/2, 0) Doubling rate: W ( p *) == –1.00 W ( p ) == log min { p 1 + 2 p 2 , p 2 + 2 p 3 , p 3 + 2 p 1 }

  44. Example: Randomization Best accessible strategy: p * == (2/4, 1/4, 1/4) Doubling rate: W ( p *) == –0.42 W ( p ) == log min { p 1 + 2 p 2 , p 2 + 2 p 3 , p 3 + 2 p 1 }

  45. Example: Randomization Best accessible strategy: p * == (3/8, 3/8, 2/8) Doubling rate: W ( p *) == –0.19 W ( p ) == log min { p 1 + 2 p 2 , p 2 + 2 p 3 , p 3 + 2 p 1 }

  46. Example: Randomization Best accessible strategy: p * == (6/16, 5/16, 5/16) Doubling rate: W ( p *) == –0.09 W ( p ) == log min { p 1 + 2 p 2 , p 2 + 2 p 3 , p 3 + 2 p 1 }

  47. Example: Randomization Coin flips Distribution Doubling rate 0 (1, 0, 0) – ∞ ∞ 1 (1/2, 1/2, 0) –1.00 0.58 2 (1/2, 1/4, 1/4) –0.42 0.23 3 (3/8, 3/8, 2/8) –0.19 0.10 4 (6/16, 5/16, 5/16) –0.09 . . . . . . . . . (1/3, 1/3, 1/3) 0.00 ∞

  48. January: Project course in information theory h t i w w o N E R O M Day 3: Guessing and Gambling ! N O N N A H Evidence, likelihood ratios, competitive prediction S Kullback-Leibler divergence Examples of diverging stochastic models Expressivity and the bias/variance tradeoffs. Day 1: Uncertainty and Inference Doubling rates and proportional betting Probability theory: Card color prediction Semantics and expressivity Random variables Day 4: Asking Questions and Engineering Answers Generative Bayesian models stochastic processes Questions and answers (or experiments and observations) mutual information Uncertain and information: Coin weighing Uncertainty as cost The maximum entropy principle The Hartley measure Shannon information content and entropy The channel coding theorem Huffman coding Day 5: Informative Descriptions and Residual Randomness Day 2: Counting Typical Sequences The practical problem of source coding The law of large numbers Kraft’s inequality and prefix codes Typical sequences and the source coding theorem. Arithmetic coding Stochastic processes and entropy rates Kolmogorov complexity the source coding theorem for stochastic processes Tests of randomness Examples Asymptotic equivalence of complexity and entropy

Recommend


More recommend