seeing the unseen from coin flips to statistical inverse
play

Seeing the unseen: from coin flips to statistical inverse problems - PowerPoint PPT Presentation

Seeing the unseen: from coin flips to statistical inverse problems Alberto J. Coca StatsLab, University of Cambridge Topics Taster University of Cambridge Open Day 5th July 2018 Outline 1 Introduction Mathematical Statistics Examples 2


  1. Example II: medical imaging We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.

  2. Example II: medical imaging We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe. Given tissues produce specific echoes. Machine “inverted” latter indirect (and incomplete!) info and dealt with instrumental errors (random): statistical inverse problem .

  3. Example II: medical imaging We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe. Given tissues produce specific echoes. Machine “inverted” latter indirect (and incomplete!) info and dealt with instrumental errors (random): statistical inverse problem . Does it do better as instrum. errors decrease?

  4. Outline 1 Introduction Mathematical Statistics Examples 2 Seeing the unseen Coin flips Statistical inverse problems

  5. Outline 1 Introduction Mathematical Statistics Examples 2 Seeing the unseen Coin flips Statistical inverse problems

  6. Coin flips: experiment I Let us simplify things and flip some coins:

  7. Coin flips: experiment I Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

  8. Coin flips: experiment I Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

  9. Coin flips: experiment I Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

  10. Coin flips: experiment I Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

  11. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin).

  12. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data?

  13. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips .

  14. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess?

  15. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess? Yes:

  16. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess? Yes: each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p

  17. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess? Yes: each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads , Tails , Heads , it has probability or likelihood L = p (1 − p ) p = p 2 (1 − p );

  18. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess? Yes: each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads , Tails , Heads , it has probability or likelihood L = p (1 − p ) p = p 2 (1 − p ); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L ( p , m , n ) = p m (1 − p ) n − m .

  19. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess? Yes: each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads , Tails , Heads , it has probability or likelihood L = p (1 − p ) p = p 2 (1 − p ); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L ( p , m , n ) = p m (1 − p ) n − m . p = m Homework : ˆ n maximises q �→ L ( q , m , n ) when q ∈ [0 , 1]!

  20. Coin flips: experiment I, MLE We do not know the probability p ∈ [0 , 1] of this coin landing Heads (typically p =1 / 2, i.e. fair coin). How to guess p from our data? p = # Heads “Frequentist” guess is ˆ # flips . Is this a sensible guess? Yes: each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads , Tails , Heads , it has probability or likelihood L = p (1 − p ) p = p 2 (1 − p ); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L ( p , m , n ) = p m (1 − p ) n − m . p = m Homework : ˆ n maximises q �→ L ( q , m , n ) when q ∈ [0 , 1]! Indeed, ˆ p is called the Maximum Likelihood Estimator (MLE).

  21. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties:

  22. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ .

  23. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0?

  24. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n .

  25. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n . Note that if Error ≈ 1 / n a = n − a for some a > 0 , then log Error ≈ − a log n .

  26. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n . Note that if Error ≈ 1 / n a = n − a for some a > 0 , then log Error ≈ − a log n . Hence, to find value of a , plot x = log n vs. y = log Error ≈ − ax and compute the slope.

  27. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n . Note that if Error ≈ 1 / n a = n − a for some a > 0 , then log Error ≈ − a log n . Hence, to find value of a , plot x = log n vs. y = log Error ≈ − ax and compute the slope. Plot suggests a = 1 / 2, i.e. Error ≈ 1 / √ n !

  28. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n . Note that if Error ≈ 1 / n a = n − a for some a > 0 , then log Error ≈ − a log n . Hence, to find value of a , plot x = log n vs. y = log Error ≈ − ax and compute the slope. Plot suggests a = 1 / 2, i.e. Error ≈ 1 / √ n ! Thus, MLE is optimal in convergence rates!

  29. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n . Note that if Error ≈ 1 / n a = n − a for some a > 0 , then log Error ≈ − a log n . Hence, to find value of a , plot x = log n vs. y = log Error ≈ − ax and compute the slope. Plot suggests a = 1 / 2, i.e. Error ≈ 1 / √ n ! Thus, MLE is optimal in convergence rates! (Optimal in other senses in view of, e.g., Central Limit Theorem, but no time to explain this.)

  30. Coin flips: experiment I, MLE p = m The MLE ˆ n enjoys mathematically desirable properties: e.g., Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞ . How fast does Error = | ˆ p − p | → 0? Mathematical results guarantee it cannot be faster than 1 / √ n . Note that if Error ≈ 1 / n a = n − a for some a > 0 , then log Error ≈ − a log n . Hence, to find value of a , plot x = log n vs. y = log Error ≈ − ax and compute the slope. Plot suggests a = 1 / 2, i.e. Error ≈ 1 / √ n ! Thus, MLE is optimal in convergence rates! (Optimal in other senses in view of, e.g., Central Limit Theorem, but no time to explain this.) Other estimators are optimal too:

  31. Coin flips: experiment I, Bayes ‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p , i.e., Prob ( p = q | no data ) = G ( q ) with G , e.g., Green: I am (obviously!) respectable and coin is probably fair;

  32. Coin flips: experiment I, Bayes ‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p , i.e., Prob ( p = q | no data ) = G ( q ) with G , e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair;

  33. Coin flips: experiment I, Bayes ‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p , i.e., Prob ( p = q | no data ) = G ( q ) with G , e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations...

  34. Coin flips: experiment I, Bayes ‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p , i.e., Prob ( p = q | no data ) = G ( q ) with G , e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations... The initial guess G ( q ) evolves through Bayes rule as we get the data giving L ( q , m , n ) G ( q ) B ( q ) = , q ∈ [0 , 1] . � 1 0 L ( r , m , n ) G ( r ) dr

  35. Coin flips: experiment I, Bayes ‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p , i.e., Prob ( p = q | no data ) = G ( q ) with G , e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations... The initial guess G ( q ) evolves through Bayes rule as we get the data giving L ( q , m , n ) G ( q ) B ( q ) = , q ∈ [0 , 1] . � 1 0 L ( r , m , n ) G ( r ) dr � 1 � 1 The normalisation ensures 0 B ( q ) dq = 0 Prob ( p = q | data ) dq = 1.

  36. Coin flips: experiment I, Bayes How does the data-update of G given by B look?

  37. Coin flips: experiment I, Bayes How does the data-update of G given by B look?

  38. Coin flips: experiment I, Bayes How does the data-update of G given by B look? B gives much more information than ˆ p !

  39. Coin flips: experiment I, Bayes How does the data-update of G given by B look? B gives much more information than ˆ p ! It does so without having to maximise likelihood!

  40. Coin flips: experiment I, Bayes How does the data-update of G given by B look? B gives much more information than ˆ p ! It does so without having to maximise likelihood! Mathematical properties?

  41. Coin flips: experiment I, Bayes How does the data-update of G given by B look? B gives much more information than ˆ p ! It does so without having to maximise likelihood! Optimal! Bernstein–von Mises Theorem (BvM) � p + 1 If G ( p ) > 0 , then B ( q ) ≈ ˆ p (1 − p ) N (0 , 1)( q ) as n → ∞ . √ n (Here N (0 , 1) is the “Bell curve”.)

  42. Coin flips: experiment I, Bayes How does the data-update of G given by B look? B gives much more information than ˆ p ! It does so without having to maximise likelihood! Optimal! Bernstein–von Mises Theorem (BvM) � p + 1 If G ( p ) > 0 , then B ( q ) ≈ ˆ p (1 − p ) N (0 , 1)( q ) as n → ∞ . √ n Indeed, Bayesian method is extensively used in practice. However, much harder to analyse mathematically: no-free-lunch principle!

  43. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1.

  44. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses.

  45. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2

  46. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now?

  47. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen ?

  48. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen ? Each sum of a pair of coin flips has options and probabilities Options 0 1 2 p 2 (1 − p ) 2 Probabs. 2 p (1 − p )

  49. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen ? Each sum of a pair of coin flips has options and probabilities Options 0 1 2 p 2 (1 − p ) 2 Probabs. 2 p (1 − p ) � #0 s Easy guesses given by “inverting table entries”: e.g., ˆ p = # data .

  50. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen ? Each sum of a pair of coin flips has options and probabilities Options 0 1 2 p 2 (1 − p ) 2 Probabs. 2 p (1 − p ) � #0 s Easy guesses given by “inverting table entries”: e.g., ˆ p = # data . Cannot be optimal as it does not use all the info from data.

  51. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen ? Each sum of a pair of coin flips has options and probabilities Options 0 1 2 p 2 (1 − p ) 2 Probabs. 2 p (1 − p ) � #0 s Easy guesses given by “inverting table entries”: e.g., ˆ p = # data . Cannot be optimal as it does not use all the info from data. If n 0 =#0 s , n 1 =#1 s and n 2 =#2 s , the likelihood of the data is L ( p , n 0 , n 1 , n 2 ) = p 2 n 0 (2 p (1 − p )) n 1 (1 − p ) 2 n 2 .

  52. Coin flips: experiment II, MLE Let H = Heads =0 and T = Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H , H , T , H , H , T , T , T , the data is 0 , 1 , 1 , 2. ���� ���� ���� ���� 0+0=0 1+0=1 0+1=1 1+1=2 How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen ? Each sum of a pair of coin flips has options and probabilities Options 0 1 2 p 2 (1 − p ) 2 Probabs. 2 p (1 − p ) � #0 s Easy guesses given by “inverting table entries”: e.g., ˆ p = # data . Cannot be optimal as it does not use all the info from data. If n 0 =#0 s , n 1 =#1 s and n 2 =#2 s , the likelihood of the data is p = 1 2 (1+ n 0 − n 2 L ( p , n 0 , n 1 , n 2 ) = p 2 n 0 (2 p (1 − p )) n 1 (1 − p ) 2 n 2 . MLE is ˆ ) . n The MLE “inverts table entries” optimally, enjoying same properties as p − p | ≈ 1 / √ n . before (and more): as n → ∞ , ˆ p → p with Error = | ˆ

  53. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method:

  54. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method: with same initial guess G ( q ) as before, we again take L ( q , n 0 , n 1 , n 2 ) G ( q ) B ( q ) = , q ∈ [0 , 1] . � 1 0 L ( r , n 0 , n 1 , n 2 ) G ( r ) dr

  55. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method: with same initial guess G ( q ) as before, we again take � � L ( q , n 0 , n 1 , n 2 ) G ( q ) p = 2 B ( q ) = , q ∈ [0 , 1] . B in action 3 ≈ 67% : � 1 0 L ( r , n 0 , n 1 , n 2 ) G ( r ) dr

  56. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method: with same initial guess G ( q ) as before, we again take � � L ( q , n 0 , n 1 , n 2 ) G ( q ) p = 2 B ( q ) = , q ∈ [0 , 1] . B in action 3 ≈ 67% : � 1 0 L ( r , n 0 , n 1 , n 2 ) G ( r ) dr

  57. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method: with same initial guess G ( q ) as before, we again take � � L ( q , n 0 , n 1 , n 2 ) G ( q ) p = 2 B ( q ) = , q ∈ [0 , 1] . B in action 3 ≈ 67% : � 1 0 L ( r , n 0 , n 1 , n 2 ) G ( r ) dr Again, B gives much more information than ˆ p !

  58. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method: with same initial guess G ( q ) as before, we again take � � L ( q , n 0 , n 1 , n 2 ) G ( q ) p = 2 B ( q ) = , q ∈ [0 , 1] . B in action 3 ≈ 67% : � 1 0 L ( r , n 0 , n 1 , n 2 ) G ( r ) dr Again, B gives much more information than ˆ p ! B much easier than “inverting table entries” (maximising likelihood)!

  59. Coin flips: experiment II, Bayes Now we appreciate further the superiority of the Bayesian method: with same initial guess G ( q ) as before, we again take � � L ( q , n 0 , n 1 , n 2 ) G ( q ) p = 2 B ( q ) = , q ∈ [0 , 1] . B in action 3 ≈ 67% : � 1 0 L ( r , n 0 , n 1 , n 2 ) G ( r ) dr Again, B gives much more information than ˆ p ! B much easier than “inverting table entries” (maximising likelihood)! In fact, we input the table entries and B “inverts” them automatically and optimally: a BvM Theorem holds!

  60. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin.

  61. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2

  62. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ?

  63. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ? Each (random) sum of coin flips has options and probabilities given by Opts. 0 1 ... 4 p 0 = q H p 2 p 1 = q H 2 p (1 − p ) p 4 =(1 − q H ) Probs. ... +(1 − q H )4 p 3 (1 − p ) × (1 − p ) 4 +(1 − q H ) p 4

  64. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ? Each (random) sum of coin flips has options and probabilities given by Opts. 0 1 ... 4 p 0 = q H p 2 p 1 = q H 2 p (1 − p ) p 4 =(1 − q H ) Probs. ... +(1 − q H )4 p 3 (1 − p ) × (1 − p ) 4 +(1 − q H ) p 4 � � 1 / 4 . #4 s 1 Guess obtained inverting p 4 , i.e. ˆ p =1 − 1 q H # data −

  65. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ? Each (random) sum of coin flips has options and probabilities given by Opts. 0 1 ... 4 p 0 = q H p 2 p 1 = q H 2 p (1 − p ) p 4 =(1 − q H ) Probs. ... +(1 − q H )4 p 3 (1 − p ) × (1 − p ) 4 +(1 − q H ) p 4 � � 1 / 4 . Not optimal! #4 s 1 Guess obtained inverting p 4 , i.e. ˆ p =1 − 1 q H # data −

  66. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ? Each (random) sum of coin flips has options and probabilities given by Opts. 0 1 ... 4 p 0 = q H p 2 p 1 = q H 2 p (1 − p ) p 4 =(1 − q H ) Probs. ... +(1 − q H )4 p 3 (1 − p ) × (1 − p ) 4 +(1 − q H ) p 4 � � 1 / 4 . Not optimal! #4 s 1 Guess obtained inverting p 4 , i.e. ˆ p =1 − 1 q H # data − If n j =# js , likelihood of data is L ( p , n 0 , ..., n 4 )= � 4 j =0 p j ( p ) n j .

  67. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ? Each (random) sum of coin flips has options and probabilities given by Opts. 0 1 ... 4 p 0 = q H p 2 p 1 = q H 2 p (1 − p ) p 4 =(1 − q H ) Probs. ... +(1 − q H )4 p 3 (1 − p ) × (1 − p ) 4 +(1 − q H ) p 4 � � 1 / 4 . Not optimal! #4 s 1 Guess obtained inverting p 4 , i.e. ˆ p =1 − 1 q H # data − If n j =# js , likelihood of data is L ( p , n 0 , ..., n 4 )= � 4 j =0 p j ( p ) n j . A unique maximiser of p �→ L ( p , n 0 , ..., n 4 ) exists (MLE) but not in closed form: “inverting table entries” explicitly is too hard!

  68. Coin flips: experiment III, MLE Do not see the coin flips directly but flip a second coin with q H ∈ [0 , 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0 , 1 , 0 (not observed!) and the first coin’s flips are H , H , T , H , H , T , T , T , the data is 0 , 2 , 2. ���� � �� � ���� 0+0=0 1+0+0+1=2 1+1=2 If q H known, how to guess p now? I.e., how to see the unseen ? Each (random) sum of coin flips has options and probabilities given by Opts. 0 1 ... 4 p 0 = q H p 2 p 1 = q H 2 p (1 − p ) p 4 =(1 − q H ) Probs. ... +(1 − q H )4 p 3 (1 − p ) × (1 − p ) 4 +(1 − q H ) p 4 � � 1 / 4 . Not optimal! #4 s 1 Guess obtained inverting p 4 , i.e. ˆ p =1 − 1 q H # data − If n j =# js , likelihood of data is L ( p , n 0 , ..., n 4 )= � 4 j =0 p j ( p ) n j . A unique maximiser of p �→ L ( p , n 0 , ..., n 4 ) exists (MLE) but not in closed form: “inverting table entries” explicitly is too hard! The implicit MLE still enjoys the same desirable properties.

  69. Coin flips: experiment III, Bayes The superiority of the Bayesian method in this experiment is clear:

Recommend


More recommend