safe probability
play

Safe Probability Suppose we observe sequence 1 , 2 , of 0s and - PDF document

Peter Grnwald November 2015 Prelude: Kelly Gambling Safe Probability Suppose we observe sequence 1 , 2 , of 0s and 1s At each point in time , we can buy a ticket ,1 that pays off $2 iff = 1, and


  1. Peter GrΓΌnwald November 2015 Prelude: Kelly Gambling Safe Probability β€’ Suppose we observe sequence π‘Œ 1 , π‘Œ 2 , … of 0s and 1s β€’ At each point in time 𝑗 , we can buy a ticket π‘ˆ 𝑗,1 that pays off $2 iff π‘Œ 𝑗 = 1, and a ticket π‘ˆ 𝑗,0 that pays off $2 Peter GrΓΌnwald iff π‘Œ 𝑗 = 0. Both tickets cost $1 Centrum Wiskunde & Informatica – Amsterdam β€’ Crucially: we are allowed to divide our capital any way Mathematisch Instituut – Universiteit Leiden we like and re-invest our capital at each point in time – e.g. By putting 50% of your capital at time i on π‘ˆ 𝑗,1 and 50% on π‘ˆ 𝑗,0 you make sure that your capital remains the same How to design a gambling Prelude: Kelly Gambling strategy? At each time 𝑗 , we can buy a ticket π‘ˆ β€’ β€’ 𝑗,1 that pays off A gambling strategy in this game is formally equivalent to a probability distribution $2 iff π‘Œ 𝑗 = 1, and a ticket π‘ˆ 𝑗,0 that pays off $2 iff π‘Œ 𝑗 = 𝑄 on infinite 0. Both tickets cost $1 sequences. Which strategy should we adopt? β€’ A gambling strategy in this game is a function and thus defines a probability distr. on 0,1 ∞ via setting β€’ If we follow such a strategy and start with $1, our capital after n rounds will be How to design a gambling How to design a gambling strategy? strategy? Strict Subjective Bayesian: determine subjective 𝑄 βˆ— , β€’ β€’ A gambling strategy in this game is formally equivalent to a probability distribution and then play optimal 𝑄 (we may have 𝑄 on infinite 𝑄 β‰  𝑄 βˆ— ) Imprecise: determine set and play β€œoptimal” sequences. Which strategy should we adopt? β€’ 𝑄 β€’ Strict Subjective Bayesian: think very long about the β€’ Information Theorist: pick any gambling strategy situation, come up with a subjective distribution 𝑄 βˆ— , which you think might gain you a lot. E.g. if you think and then play the distribution 𝑄 maximizing expected frequency might converge to π‘ž β‰  0.5 , you might play gain (we may have 𝑄 β‰  𝑄 βˆ— ) Laplace rule of succession... β€’ Imprecise Probabilist: come up with a set of distributions , and then play the distribution 𝑄 optimal relative to , with optimality defined relative to some additional criterion (which one?) Safe Probability – Workshop Teddy Seidenfeld 1

  2. Peter GrΓΌnwald November 2015 How to design a gambling Starting Point strategy? Strict Subjective Bayesian: determine subjective 𝑄 βˆ— , β€’ β€’ Adopting a Bayesian predictive distribution like the and then play optimal 𝑄 (we may have 𝑄 β‰  𝑄 βˆ— ) Laplace Rule of Succession if you think data are not Imprecise: determine set and play β€œoptimal” 𝑄 Bernoulli is o.k. (and I think, rational!) for some β€’ prediction tasks... β€’ Information Theorist: pick any gambling strategy – Sequential gambling, Data Compression which you think might gain you a lot. E.g. if you think ...but not for others: frequency might converge to π‘ž β‰  0.5 , you might play – 0/1-loss prediction (no fractional bets!) when you are only Laplace rule of succession... asked to predict π‘Œ 𝑗 in the situation that π‘Œ π‘—βˆ’1 = 1 β€’ I want to design a theory which can cope with such β€˜partially useable’ distributions ...if your hypothesis about frequence is correct, you gain exponential amount of money even if at the same time you think data are not Bernoulli (or not even stationary) A Middle Ground between strict Menu Bayes and imprecise probability 1. The Setting β€’ Set of distrs has unique 2. Definition 1, Example 1: Dilation representative , as in β€˜objective Bayes’, fiducial inference, Maximum 3. Definition 2, Example 1 cont. Entropy, data compression... β€’ 4. Definition 3-4, Example 2: Calibration One absolutely crucial difference: we restrict use of 𝑄 to subset of all 5. Example 3: Fiducial Distributions possible prediction tasks: we know in advance that 𝑄 should not be 6. Desert: Monty Hall Problem, Decision Safety 𝑄 taken to seriously β€’ Provides unifying and demistifying view 𝑄 The Setting The Setting β€’ A Bayesian would have a singleton and could then set β€’ Let be a set of distributions on a space Ξ©, representing Note that 𝑄 βˆ— is a distribution on Ξ©, inducing a joint β€’ Decision- Maker (DM)’s uncertainty about a domain which in turn induces , while is DM has to make predictions/assertions about some 𝑉 (or a β€’ directly defined as a conditional function thereof), upon observing π‘Š. Both 𝑉 and π‘Š are RVs (hence 𝑄 in picture to be taken with grain of salt) (random variables) on Ξ©, taking values in and , resp. She does so using a pragmatic distribution β€’ 𝑄 𝑉 π‘Š , defined as a conditional distribution of 𝑉 given π‘Š , i.e. a function mapping each to a distribution on 𝑄(𝑉|π‘Š) β€’ Whenever finite, we think of as a column vector Safe Probability – Workshop Teddy Seidenfeld 2

  3. Peter GrΓΌnwald November 2015 The Setting First Definition: Weak Safety β€’ A Bayesian would have a singleton and could then set We say that β€’ 𝑄 𝑉 π‘Š is safe for 𝑉 | βŒ©π‘ŠβŒͺ if for all : β€’ We have to do something else – sometimes eqv. to conditioning on a special element of , sometimes really different... 𝑸 is really a probability update rule!! β€’ i.e. 𝑄(𝑉|π‘Š) First Definition First Definition We say that We say that β€’ 𝑄 𝑉 π‘Š is safe for 𝑉 | βŒ©π‘ŠβŒͺ if for all : β€’ 𝑄 𝑉 π‘Š is safe for 𝑉 | βŒ©π‘ŠβŒͺ if for all : β€’ β€’ i.e. we can expect our expectation of 𝑉 to be β€˜correct’ i.e. (in a relative sense) β€’ we will usually want somewhat stronger versions of β€˜safety’ First Example: Dilation Dilation Seidenfeld & Wasserman, β€˜93 β€’ Given: marginal probability of 𝑉 . 𝑉 may depend on Before observing π‘Š we had precise probability π‘Š , but we have no idea how after we only know is in large superset Task: predict 𝑉 given π‘Š . β€’ β€œ extra information οƒž less knowledge Suppose we observe π‘Š = 0 . Now conditional β€’ no matter what you observe !” probability could be anything... Similarly if we observe π‘Š = 1 : β€’ Safe Probability – Workshop Teddy Seidenfeld 3

  4. Peter GrΓΌnwald November 2015 First Example of β€˜Safety’ Ignoring instead of Dilating β€’ β€’ Pointwise conditioning gives dilation REALITY : U may be dependent on V β€’ Instead we may decide to ignore π‘Š , i.e. act as if 𝑉 β€’ PRAGMATICS : we nevertheless decide to predict U and π‘Š are independent, and predict with the with a distribution that assumes U and V are pragmatic distribution independent β€’ Our predictions will be just as accurate as we Proposition: 𝑄 𝑉 π‘Š is safe for 𝑉 | βŒ©π‘ŠβŒͺ would expect them to be if our pragmatic β€’ distribution 𝑸 were β€˜correct’ ...as long as we only use β€’ 𝑄 only for certain, not all prediction tasks... β€’ i.e. Definition 2, Preparation Definition 2 Recall: 𝑄 𝑉 π‘Š is safe for 𝑽′ |βŒ©π‘ŠβŒͺ if and for β€’ We write if there exists a function 𝜚 such that β€’ all : 𝜚 π‘Œ ≑ 𝑍 (β€œ π‘Œ determines 𝑍 β€œ) β€’ 𝑄 𝑉 π‘Š can be used to predict not just 𝑉 , but also any 𝑉′ determined by (𝑉, π‘Š) , i.e. with : We say that 𝑄 𝑉 π‘Š is safe for 𝑽 | βŒ©π‘ŠβŒͺ if for all 𝑉 β€² with β€’ , all : : Example 1(b) - dilation again Definition 2 Recall: β€’ 𝑄 𝑉 π‘Š is safe for 𝑽′ |βŒ©π‘ŠβŒͺ if and for all : β€’ Task: predict 𝑉 given π‘Š . β€’ Again we decide to ignore π‘Š and set e.g. for all : We say that 𝑄 𝑉 π‘Š is safe for 𝑽 | βŒ©π‘ŠβŒͺ if for all 𝑉 β€² with β€’ Then 𝑄 is safe for 𝑉 | βŒ©π‘ŠβŒͺ but not for 𝑉 | βŒ©π‘ŠβŒͺ β€’ , all : : Safe Probability – Workshop Teddy Seidenfeld 4

  5. Peter GrΓΌnwald November 2015 Example 1(c) Example 1(c): use the marginal β€’ Task: predict 𝑉 given π‘Š . β€’ Task: predict 𝑉 given π‘Š . β€’ Again we decide to ignore π‘Š and set e.g. for all : β€’ Again we decide to ignore π‘Š and set e.g. for all : Then, again , Then β€’ 𝑄 is safe for 𝑉 | βŒ©π‘ŠβŒͺ but not for 𝑉 | βŒ©π‘ŠβŒͺ β€’ 𝑄 is safe for 𝑉 | βŒ©π‘ŠβŒͺ and also for 𝑉 | βŒ©π‘ŠβŒͺ Definition 3, Preparation Definition 3 Recall: Recall: 𝑄 𝑉 π‘Š is safe for 𝑉′ |βŒ©π‘ŠβŒͺ if and for 𝑄 𝑉 π‘Š is safe for 𝑉′ |βŒ©π‘ŠβŒͺ if for all : β€’ β€’ all : We say that 𝑄 𝑉 π‘Š is safe for βŒ©π‘‰ β€² βŒͺ| 𝑾 if for all : β€’ β€’ Leave out β€˜ β€˜ part from now on, for brevity : : Definition 3 Definition 3, 3b Recall: Recall: β€’ 𝑄 𝑉 π‘Š is safe for 𝑉′ |βŒ©π‘ŠβŒͺ if for all : β€’ 𝑄 𝑉 π‘Š is safe for 𝑉′ |βŒ©π‘ŠβŒͺ if for all : We say that We say that 𝑄 𝑉 π‘Š is safe for βŒ©π‘‰ β€² βŒͺ| 𝑾 if for all : 𝑄 𝑉 π‘Š is safe for βŒ©π‘‰ β€² βŒͺ| 𝑾 if for all : β€’ β€’ We say that 𝑄 𝑉 π‘Š is safe for 𝑽 β€² | 𝑾 if for all : β€’ Our expectation of U’ is (relatively) correct β€’ : : i.e. 𝑄 βˆ— is unique and 𝑄 is almost surely β€˜correct’ Safe Probability – Workshop Teddy Seidenfeld 5

Recommend


More recommend