Exchangeability and predictive inference Using a special type of symmetry Gert de Cooman SYSTeMS Research Group Ghent University gert.decooman@UGent.be http://users.UGent.be/~gdcooma Second SIPTA School on Imprecise Probabilities 25 July 2006 Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 1 / 70
Today’s main topic Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 2 / 70
Predictive inference The general problem Very important problem in statistics, and in science in general: Consider a system. Make a number of observations about the system. Use these observations to make inferences or predictions about the next observations. = PREDICTIVE INFERENCE Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 3 / 70
Predictive inference Formalising the problem We envisage making N observations X 1 , . . . , X N . Since, before making the k -th observation, we don’t necessarily know the value of X k , we call X k a random variable. We assume that all random variables X k assume values in the same finite set X . After making n observations X 1 = x 1 ,..., X n = x n , make inferences/predictions for the remaining n ′ = N − n random variables X n + 1 ,..., X N . Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 4 / 70
Predictive inference The fundamental reductive assumption We assume that the mechanism that produces the observations X k is essentially stationary or time-invariant. More precisely: we assume that the order in which the observations are observed is of no relevance (to the predictions). This is a special assumption of symmetry underlying the observations, called exchangeability [de Finetti, 1937]. This assumption reduces the very difficult general problem to a solvable special case, and is quite useful in practice. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 5 / 70
Predictive inference An example: smoking and lung cancer Consider a general population of people, both smokers and non-smokers. We make observations by selecting people at random, determine whether they are smokers or not, and whether they have lung cancer, or develop it during the year after the selection. The possible values for the observations are X = { S − L , S − NL , NS − L , NS − NL } where S − L means ‘smoker that has or develops lung cancer’ S − NL means ‘smoker that does not have or develop lung cancer’ NS − L means ‘non-smoker that has or develops lung cancer’ NS − NL means ‘non-smoker that does not have or develop lung cancer’. We are interested in the probability that a smoker selected at random from the population will have or develop lung-cancer. We assume that the order in which the people are selected at random from the population is of no importance. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 6 / 70
Modelling symmetry Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 7 / 70
An example Tossing a coin I am going to toss a coin in the next room. How do you model your information (beliefs) about the outcome? Situation A You have seen and examined the coin, and you believe it is symmetrical (not biased). Situation B You have no information about the coin, it may be heavily loaded, it may even have two heads or two tails. In Situation A, there is information that the phenomenon described is invariant under permutation of heads and tails: Evidence of symmetry. In Situation B, your information (none) is invariant under permutation of heads and tails: Symmetry of evidence. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 8 / 70
Modelling the available information We want a model for the available information or evidence: a belief model. ◮ In Situation A, the belief model should reflect that there is evidence of symmetry. ◮ In Situation B, the evidence is invariant under permutations of heads and tails, so the belief model should be invariant as well. Since the available information is different in both situations, the corresponding belief models should be different too! Belief models should be able to capture the difference between ‘symmetry of evidence’ and ‘evidence of symmetry’. This is not the case for Bayesian probability models. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 9 / 70
What are we going to do? Present a more general class of belief models, of which the Bayesian belief models constitute a special subclass. Explain how to model aspects of symmetry for such general belief models, and in particular: ◮ symmetry of evidence, ◮ evidence of symmetry. Argue that both aspects are different in general, but coincide for Bayesian belief models. Being able to deal with natural symmetries is often quite useful in applications, and is of fundamental theoretical importance. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 10 / 70
More general belief models Accepting gambles Consider a random variable X that may assume values x in a set X : the actual value of X is unknown to you. How do you model the available information about X ? Your beliefs about X lead you to certain behaviour: accepting or rejecting gambles on the value of X . Definition A gamble on X is a bounded map f : X → R . The set of all gambles on X is denoted by L ( X ) , A gamble f associates with any possible value x of X a corresponding reward f ( x ) , which may be negative. If you accept a gamble f , this means that after determining the actual value x of X , you will get the reward f ( x ) . Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 11 / 70
Accepting gambles An example: coin tossing X is the outcome of my tossing a coin. X = { h , t } if you accept the gamble f with f ( h ) = − 1 and f ( t ) = 2 this means that you are willing to engage in the following transaction: ◮ we determine the outcome of the toss; ◮ you win 2 if the outcome is t and you lose 1 if it is h . If you think the coin is fair, you will accept f . You will not accept g with g ( h ) = − 10 and g ( t ) = 1 unless you are quite sure that the outcome will be t . You will always accept h with h ( h ) = h ( t ) = 5 . Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 12 / 70
More general belief models Sets of desirable gambles We collect all the gambles you accept in a set D of desirable gambles. Such as set should satisfy the following rationality requirements: Definition A set of acceptable gambles D is called coherent if D1. if sup f < 0 then f �∈ D [avoiding sure loss] D2. if f ≥ 0 then f ∈ D [accepting sure gains] D3. if f ∈ D and λ ≥ 0 then λ f ∈ D [scale invariance] D4. if f ∈ D and g ∈ D then f + g ∈ D [combination] So D should be a convex cone including the first orthant and containing no uniformly negative gambles. Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 13 / 70
More general belief models An example: coin tossing t 1 h 1 Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 14 / 70
More general belief models An example: coin tossing t 1 h 1 Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 14 / 70
More general belief models An example: coin tossing t 1 h 1 Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 14 / 70
Equivalent other belief models Lower and upper previsions The lower prevision P ( f ) of a gamble f is defined as P ( f ) = sup { µ : f − µ ∈ D } , the supremum acceptable price for buying f . The upper prevision P ( f ) of a gamble f is defined as P ( f ) = inf { µ : µ − f ∈ D } , the infimum acceptable price for selling f . Observe that P ( f ) = − P ( − f ) [conjugacy] . Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 15 / 70
Coherence For lower and upper previsions Theorem A lower prevision P : L ( X ) → R is coherent if and only if LP1. P ( f ) ≥ inf f [avoiding sure loss] LP2. P ( λ f ) = λ P ( f ) if λ ≥ 0 [non-negative homogeneity] LP3. P ( f + g ) ≥ P ( f )+ P ( g ) [super-additivity] For a subset A of X with indicator I A P ( A ) : = P ( I A ) is called the lower probability of A . Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 16 / 70
A special case Bayesian belief models When P and P coincide everywhere, we have a precise prevision P = P = P . P is then a linear functional that is monotone and normalised P1. P ( λ f + µ g ) = λ P ( f )+ µ P ( g ) for all real λ and µ [linearity] P2. if f ≥ 0 then P ( f ) ≥ 0 [monotonicity] P3. P ( 1 ) = 1 [normalisation] For a subset A of X , P ( A ) : = P ( I A ) is the probability of A . P restricted to events is a finitely additive probability measure. P ( f ) is the associated expectation of f : � f d P . P ( f ) = Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 17 / 70
Equivalent other belief models Three mathematically equivalent models P ( · ) ւ D M { f : P ( f ) ≥ 0 } { f : ( ∀ P ∈ M ) P ( f ) ≥ 0 } D max { s : ·− s ∈ D } min { P ( · ) : P ∈ M } P ( · ) { P : ( ∀ f ∈ D ) P ( f ) ≥ 0 } { P : ( ∀ f ) P ( f ) ≥ P ( f ) } M Table: Bijective relationships between the equivalent models: coherent sets of desirable gambles D , coherent lower previsions P on L ( X ) , and closed convex sets of previsions M Gert de Cooman (UGent, SYSTeMS) Predictive inference 25 July 2006 18 / 70
Recommend
More recommend