Safe Probability Suppose we observe sequence 1 , 2 , of 0s and - PDF document

Peter Grünwald November 2015 Prelude: Kelly Gambling Safe Probability • Suppose we observe sequence 𝑌 1 , 𝑌 2 , … of 0s and 1s • At each point in time 𝑗 , we can buy a ticket 𝑈 𝑗,1 that pays off $2 iff 𝑌 𝑗 = 1, and a ticket 𝑈 𝑗,0 that pays off $2 Peter Grünwald iff 𝑌 𝑗 = 0. Both tickets cost $1 Centrum Wiskunde & Informatica – Amsterdam • Crucially: we are allowed to divide our capital any way Mathematisch Instituut – Universiteit Leiden we like and re-invest our capital at each point in time – e.g. By putting 50% of your capital at time i on 𝑈 𝑗,1 and 50% on 𝑈 𝑗,0 you make sure that your capital remains the same How to design a gambling Prelude: Kelly Gambling strategy? At each time 𝑗 , we can buy a ticket 𝑈 • • 𝑗,1 that pays off A gambling strategy in this game is formally equivalent to a probability distribution $2 iff 𝑌 𝑗 = 1, and a ticket 𝑈 𝑗,0 that pays off $2 iff 𝑌 𝑗 = 𝑄 on infinite 0. Both tickets cost $1 sequences. Which strategy should we adopt? • A gambling strategy in this game is a function and thus defines a probability distr. on 0,1 ∞ via setting • If we follow such a strategy and start with $1, our capital after n rounds will be How to design a gambling How to design a gambling strategy? strategy? Strict Subjective Bayesian: determine subjective 𝑄 ∗ , • • A gambling strategy in this game is formally equivalent to a probability distribution and then play optimal 𝑄 (we may have 𝑄 on infinite 𝑄 ≠ 𝑄 ∗ ) Imprecise: determine set and play “optimal” sequences. Which strategy should we adopt? • 𝑄 • Strict Subjective Bayesian: think very long about the • Information Theorist: pick any gambling strategy situation, come up with a subjective distribution 𝑄 ∗ , which you think might gain you a lot. E.g. if you think and then play the distribution 𝑄 maximizing expected frequency might converge to 𝑞 ≠ 0.5 , you might play gain (we may have 𝑄 ≠ 𝑄 ∗ ) Laplace rule of succession... • Imprecise Probabilist: come up with a set of distributions , and then play the distribution 𝑄 optimal relative to , with optimality defined relative to some additional criterion (which one?) Safe Probability – Workshop Teddy Seidenfeld 1

Peter Grünwald November 2015 How to design a gambling Starting Point strategy? Strict Subjective Bayesian: determine subjective 𝑄 ∗ , • • Adopting a Bayesian predictive distribution like the and then play optimal 𝑄 (we may have 𝑄 ≠ 𝑄 ∗ ) Laplace Rule of Succession if you think data are not Imprecise: determine set and play “optimal” 𝑄 Bernoulli is o.k. (and I think, rational!) for some • prediction tasks... • Information Theorist: pick any gambling strategy – Sequential gambling, Data Compression which you think might gain you a lot. E.g. if you think ...but not for others: frequency might converge to 𝑞 ≠ 0.5 , you might play – 0/1-loss prediction (no fractional bets!) when you are only Laplace rule of succession... asked to predict 𝑌 𝑗 in the situation that 𝑌 𝑗−1 = 1 • I want to design a theory which can cope with such ‘partially useable’ distributions ...if your hypothesis about frequence is correct, you gain exponential amount of money even if at the same time you think data are not Bernoulli (or not even stationary) A Middle Ground between strict Menu Bayes and imprecise probability 1. The Setting • Set of distrs has unique 2. Definition 1, Example 1: Dilation representative , as in ‘objective Bayes’, fiducial inference, Maximum 3. Definition 2, Example 1 cont. Entropy, data compression... • 4. Definition 3-4, Example 2: Calibration One absolutely crucial difference: we restrict use of 𝑄 to subset of all 5. Example 3: Fiducial Distributions possible prediction tasks: we know in advance that 𝑄 should not be 6. Desert: Monty Hall Problem, Decision Safety 𝑄 taken to seriously • Provides unifying and demistifying view 𝑄 The Setting The Setting • A Bayesian would have a singleton and could then set • Let be a set of distributions on a space Ω, representing Note that 𝑄 ∗ is a distribution on Ω, inducing a joint • Decision- Maker (DM)’s uncertainty about a domain which in turn induces , while is DM has to make predictions/assertions about some 𝑉 (or a • directly defined as a conditional function thereof), upon observing 𝑊. Both 𝑉 and 𝑊 are RVs (hence 𝑄 in picture to be taken with grain of salt) (random variables) on Ω, taking values in and , resp. She does so using a pragmatic distribution • 𝑄 𝑉 𝑊 , defined as a conditional distribution of 𝑉 given 𝑊 , i.e. a function mapping each to a distribution on 𝑄(𝑉|𝑊) • Whenever finite, we think of as a column vector Safe Probability – Workshop Teddy Seidenfeld 2

Peter Grünwald November 2015 The Setting First Definition: Weak Safety • A Bayesian would have a singleton and could then set We say that • 𝑄 𝑉 𝑊 is safe for 𝑉 | 〈𝑊〉 if for all : • We have to do something else – sometimes eqv. to conditioning on a special element of , sometimes really different... 𝑸 is really a probability update rule!! • i.e. 𝑄(𝑉|𝑊) First Definition First Definition We say that We say that • 𝑄 𝑉 𝑊 is safe for 𝑉 | 〈𝑊〉 if for all : • 𝑄 𝑉 𝑊 is safe for 𝑉 | 〈𝑊〉 if for all : • • i.e. we can expect our expectation of 𝑉 to be ‘correct’ i.e. (in a relative sense) • we will usually want somewhat stronger versions of ‘safety’ First Example: Dilation Dilation Seidenfeld & Wasserman, ‘93 • Given: marginal probability of 𝑉 . 𝑉 may depend on Before observing 𝑊 we had precise probability 𝑊 , but we have no idea how after we only know is in large superset Task: predict 𝑉 given 𝑊 . • “ extra information  less knowledge Suppose we observe 𝑊 = 0 . Now conditional • no matter what you observe !” probability could be anything... Similarly if we observe 𝑊 = 1 : • Safe Probability – Workshop Teddy Seidenfeld 3

Peter Grünwald November 2015 First Example of ‘Safety’ Ignoring instead of Dilating • • Pointwise conditioning gives dilation REALITY : U may be dependent on V • Instead we may decide to ignore 𝑊 , i.e. act as if 𝑉 • PRAGMATICS : we nevertheless decide to predict U and 𝑊 are independent, and predict with the with a distribution that assumes U and V are pragmatic distribution independent • Our predictions will be just as accurate as we Proposition: 𝑄 𝑉 𝑊 is safe for 𝑉 | 〈𝑊〉 would expect them to be if our pragmatic • distribution 𝑸 were ‘correct’ ...as long as we only use • 𝑄 only for certain, not all prediction tasks... • i.e. Definition 2, Preparation Definition 2 Recall: 𝑄 𝑉 𝑊 is safe for 𝑽′ |〈𝑊〉 if and for • We write if there exists a function 𝜚 such that • all : 𝜚 𝑌 ≡ 𝑍 (“ 𝑌 determines 𝑍 “) • 𝑄 𝑉 𝑊 can be used to predict not just 𝑉 , but also any 𝑉′ determined by (𝑉, 𝑊) , i.e. with : We say that 𝑄 𝑉 𝑊 is safe for 𝑽 | 〈𝑊〉 if for all 𝑉 ′ with • , all : : Example 1(b) - dilation again Definition 2 Recall: • 𝑄 𝑉 𝑊 is safe for 𝑽′ |〈𝑊〉 if and for all : • Task: predict 𝑉 given 𝑊 . • Again we decide to ignore 𝑊 and set e.g. for all : We say that 𝑄 𝑉 𝑊 is safe for 𝑽 | 〈𝑊〉 if for all 𝑉 ′ with • Then 𝑄 is safe for 𝑉 | 〈𝑊〉 but not for 𝑉 | 〈𝑊〉 • , all : : Safe Probability – Workshop Teddy Seidenfeld 4

Peter Grünwald November 2015 Example 1(c) Example 1(c): use the marginal • Task: predict 𝑉 given 𝑊 . • Task: predict 𝑉 given 𝑊 . • Again we decide to ignore 𝑊 and set e.g. for all : • Again we decide to ignore 𝑊 and set e.g. for all : Then, again , Then • 𝑄 is safe for 𝑉 | 〈𝑊〉 but not for 𝑉 | 〈𝑊〉 • 𝑄 is safe for 𝑉 | 〈𝑊〉 and also for 𝑉 | 〈𝑊〉 Definition 3, Preparation Definition 3 Recall: Recall: 𝑄 𝑉 𝑊 is safe for 𝑉′ |〈𝑊〉 if and for 𝑄 𝑉 𝑊 is safe for 𝑉′ |〈𝑊〉 if for all : • • all : We say that 𝑄 𝑉 𝑊 is safe for 〈𝑉 ′ 〉| 𝑾 if for all : • • Leave out ‘ ‘ part from now on, for brevity : : Definition 3 Definition 3, 3b Recall: Recall: • 𝑄 𝑉 𝑊 is safe for 𝑉′ |〈𝑊〉 if for all : • 𝑄 𝑉 𝑊 is safe for 𝑉′ |〈𝑊〉 if for all : We say that We say that 𝑄 𝑉 𝑊 is safe for 〈𝑉 ′ 〉| 𝑾 if for all : 𝑄 𝑉 𝑊 is safe for 〈𝑉 ′ 〉| 𝑾 if for all : • • We say that 𝑄 𝑉 𝑊 is safe for 𝑽 ′ | 𝑾 if for all : • Our expectation of U’ is (relatively) correct • : : i.e. 𝑄 ∗ is unique and 𝑄 is almost surely ‘correct’ Safe Probability – Workshop Teddy Seidenfeld 5

Safe Probability Suppose we observe sequence 1 , 2 , of 0s and - PDF document

Peter Grnwald November 2015 Prelude: Kelly Gambling Safe Probability Suppose we observe sequence 1 , 2 , of 0s and 1s At each point in time , we can buy a ticket ,1 that pays off $2 iff = 1, and

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

Probability Probability Random variables Atomic events Sample space Probability

Foundations of Computer Science Lecture 16 Conditional Probability Updating a Probability when

Foundations of Computer Science Lecture 16 Conditional Probability Updating a Probability when

P1 - Probability STAT 587 (Engineering) Iowa State University August 17, 2020 Probability

Basics of Probability Basics of Probability Janyl Jumadinova February 2426, 2020 Janyl

Coherent enstrophy dissipation in the inviscid limit of 2D turbulence Romain Nguyen van yen 1

Multiresolution Analysis DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Transformations combination of translation, rotation and reflection, so that every part of each

Model Repair Leif Kobbelt RWTH Aachen University 1

Conductor ideals of affine monoids and K -theory Joseph Gubeladze San Francisco State University

Variation of Geometric Invariant Theory and Derived Categories David Favero University of Vienna

Introduction to Relativity & Time Dilation The Principle of Newtonian Relativity

Lecture 3: Binary image analysis Thursday, Sept 6 Sudheendras office hours Mon, Wed

Sambuz

Useful Links

Newsletter

Mail Us

Safe Probability Suppose we observe sequence 1 , 2 , of 0s and - PDF document

Peter Grnwald November 2015 Prelude: Kelly Gambling Safe Probability Suppose we observe sequence 1 , 2 , of 0s and 1s At each point in time , we can buy a ticket ,1 that pays off $2 iff = 1, and

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

Probability Probability Random variables Atomic events Sample space Probability

Foundations of Computer Science Lecture 16 Conditional Probability Updating a Probability when

Foundations of Computer Science Lecture 16 Conditional Probability Updating a Probability when

P1 - Probability STAT 587 (Engineering) Iowa State University August 17, 2020 Probability

Basics of Probability Basics of Probability Janyl Jumadinova February 2426, 2020 Janyl

Coherent enstrophy dissipation in the inviscid limit of 2D turbulence Romain Nguyen van yen 1

Multiresolution Analysis DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Transformations combination of translation, rotation and reflection, so that every part of each

Model Repair Leif Kobbelt RWTH Aachen University 1

Conductor ideals of affine monoids and K -theory Joseph Gubeladze San Francisco State University

Variation of Geometric Invariant Theory and Derived Categories David Favero University of Vienna

Introduction to Relativity &amp; Time Dilation The Principle of Newtonian Relativity

Lecture 3: Binary image analysis Thursday, Sept 6 Sudheendras office hours Mon, Wed

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Relativity & Time Dilation The Principle of Newtonian Relativity