0. Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (1–14)
3. Su ffi ciency 3.1. Su ffi cient statistics Su ffi cient statistics The concept of su ffi ciency addresses the question “Is there a statistic T ( X ) that in some sense contains all the information about θ that is in the sample?” Example 3.1 X 1 , . . . , X n iid Bernoulli( θ ), so that P ( X i =1) = 1 � P ( X i =0) = θ for some 0 < θ < 1. P x i (1 � θ ) n � P x i . So f X ( x | θ ) = Q n i =1 θ x i (1 � θ ) 1 � x i = θ This depends on the data only through T ( x ) = P x i , the total number of ones. Note that T ( X ) ⇠ Bin( n , θ ). If T ( x ) = t , then P x i (1 � θ ) n � P x i ◆ � 1 ✓ n f X | T = t ( x | T = t ) = P θ ( X = x , T = t ) = P θ ( X = x P θ ( T = t ) = θ = , � n P θ ( T = t ) � θ t (1 � θ ) n � t t t ie the conditional distribution of X given T = t does not depend on θ . Thus if we know T , then additional knowledge of x (knowing the exact sequence of 0’s and 1’s) does not give extra information about θ . ⇤ Lecture 3. Su ffi ciency 2 (1–14)
3. Su ffi ciency 3.1. Su ffi cient statistics Definition 3.1 A statistic T is su ffi cient for θ if the conditional distribution of X given T does not depend on θ . Note that T and/or θ may be vectors. In practice, the following theorem is used to find su ffi cient statistics. Lecture 3. Su ffi ciency 3 (1–14)
3. Su ffi ciency 3.1. Su ffi cient statistics Theorem 3.2 (The Factorisation criterion) T is su ffi cient for θ i ff f X ( x | θ ) = g ( T ( x ) , θ ) h ( x ) for suitable functions g and h. Proof (Discrete case only) Suppose f X ( x | θ ) = g ( T ( x ) , θ ) h ( x ). If T ( x )= t then P θ ( X = x , T ( X )= t ) g ( T ( x ) , θ ) h ( x ) f X | T = t ( x | T = t ) = = P θ ( T = t ) P { x 0 : T ( x 0 )= t } g ( t , θ ) h ( x 0 ) g ( t , θ ) h ( x ) h ( x ) = { x 0 : T ( x 0 )= t } h ( x 0 ) = { x 0 : T ( x 0 )= t } h ( x 0 ) , g ( t , θ ) P P which does not depend on θ , so T is su ffi cient. Now suppose that T is su ffi cient so that the conditional distribution of X | T = t does not depend on θ . Then P θ ( X = x ) = P θ ( X = x , T ( X ) = t ( x )) = P θ ( X = x | T = t ) P θ ( T = t ) . The first factor does not depend on θ by assumption; call it h ( x ). Let the second factor be g ( t , θ ), and so we have the required factorisation. ⇤ Lecture 3. Su ffi ciency 4 (1–14)
3. Su ffi ciency 3.1. Su ffi cient statistics Example 3.1 continued P x i (1 � θ ) n � P x i . For Bernoulli trials, f X ( x | θ ) = θ Take g ( t , θ ) = θ t (1 � θ ) n � t and h ( x ) = 1 to see that T ( X ) = P X i is su ffi cient for θ . ⇤ Example 3.2 Let X 1 , . . . , X n be iid U [0 , θ ]. Write 1 A ( x ) for the indicator function, = 1 if x 2 A , = 0 otherwise. We have n 1 θ 1 [0 , θ ] ( x i ) = 1 Y f X ( x | θ ) = θ n 1 { max i x i θ } (max x i ) 1 { 0 min i x i } (min x i ) . i i i =1 Then T ( X ) = max i X i is su ffi cient for θ . ⇤ Lecture 3. Su ffi ciency 5 (1–14)
3. Su ffi ciency 3.2. Minimal su ffi cient statistics Minimal su ffi cient statistics Su ffi cient statistics are not unique. If T is su ffi cient for θ , then so is any (1-1) function of T . X itself is always su ffi cient for θ ; take T ( X ) = X , g ( t , θ ) = f X ( t | θ ) and h ( x ) = 1. But this is not much use. The sample space X n is partitioned by T into sets { x 2 X n : T ( x ) = t } . If T is su ffi cient, then this data reduction does not lose any information on θ . We seek a su ffi cient statistic that achieves the maximum-possible reduction. Definition 3.3 A su ffi cient statistic T ( X ) is minimal su ffi cient if it is a function of every other su ffi cient statistic: i.e. if T 0 ( X ) is also su ffi cient, then T 0 ( X ) = T 0 ( Y ) ! T ( X ) = T ( Y ) i.e. the partition for T is coarser than that for T 0 . Lecture 3. Su ffi ciency 6 (1–14)
3. Su ffi ciency 3.2. Minimal su ffi cient statistics Minimal su ffi cient statistics can be found using the following theorem. Theorem 3.4 Suppose T = T ( X ) is a statistic such that f X ( x ; θ ) / f X ( y ; θ ) is constant as a function of θ if and only if T ( x ) = T ( y ) . Then T is minimal su ffi cient for θ . Sketch of proof : Non-examinable First, we aim to use the Factorisation Criterion to show su ffi ciency. Define an equivalence relation ∼ on X n by setting x ∼ y when T ( x ) = T ( y ). (Check that this is indeed an equivalence relation.) Let U = { T ( x ) : x ∈ X n } , and for each u in U , choose a representative x u from the equivalence class { x : T ( x ) = u } . Let x be in X n and suppose that T ( x ) = t . Then x is in the equivalence class { x 0 : T ( x 0 ) = t } , which has representative x t , and this representative may also be written x T ( x ) . We have x ∼ x t , so f X ( x ; θ ) that T ( x ) = T ( x t ), ie T ( x ) = T ( x T ( x ) ). Hence, by hypothesis, the ratio f X ( x T ( x ) ; θ ) does not depend on θ , so let this be h ( x ). Let g ( t , θ ) = f X ( x t , θ ). Then f X ( x ; θ ) f X ( x ; θ ) = f X ( x T ( x ) ; θ ) f X ( x T ( x ) ; θ ) = g ( T ( x ) , θ ) h ( x ) , and so T = T ( X ) is su ffi cient for θ by the Factorisation Criterion. Lecture 3. Su ffi ciency 7 (1–14)
3. Su ffi ciency 3.2. Minimal su ffi cient statistics Next we aim to show that T ( X ) is a function of every other su ffi cient statistic. Suppose that S ( X ) is also su ffi cient for θ , so that, by the Factorisation Criterion, there exist functions g S and h S (we call them g S and h S to show that they belong to S and to distinguish them from g and h above) such that f X ( x ; θ ) = g S ( S ( x ) , θ ) h S ( x ) . Suppose that S ( x ) = S ( y ). Then f X ( y ; θ ) = g S ( S ( x ) , θ ) h S ( x ) f X ( x ; θ ) g S ( S ( y ) , θ ) h S ( y ) = h S ( x ) h S ( y ) , because S ( x ) = S ( y ). This means that the ratio f X ( x ; θ ) f X ( y ; θ ) does not depend on θ , and this implies that T ( x ) = T ( y ) by hypothesis. So we have shown that S ( x ) = S ( y ) implies that T ( x ) = T ( y ), i.e T is a function of S . Hence T is minimal su ffi cient. ⇤ Lecture 3. Su ffi ciency 8 (1–14)
3. Su ffi ciency 3.2. Minimal su ffi cient statistics Example 3.3 Suppose X 1 , . . . , X n are iid N ( µ, σ 2 ). Then (2 πσ 2 ) � n / 2 exp � � 1 i ( x i � µ ) 2 f X ( x | µ, σ 2 ) P 2 σ 2 = (2 πσ 2 ) � n / 2 exp f X ( y | µ, σ 2 ) � � 1 i ( y i � µ ) 2 P 2 σ 2 ( X ! X !) � 1 + µ x 2 X y 2 X = exp i � x i � y i . i 2 σ 2 σ 2 i i i i This is constant as a function of ( µ, σ 2 ) i ff P i x 2 i y 2 i = P i and P i x i = P i y i . �P i X 2 � is minimal su ffi cient for ( µ, σ 2 ). ⇤ So T ( X ) = i , P i X i 1-1 functions of minimal su ffi cient statistics are also minimal su ffi cient. So T 0 ( X ) = ( ¯ X , P ( X i � ¯ X ) 2 ) is also su ffi cient for ( µ, σ 2 ), where ¯ X = P i X i / n . We write S XX for P ( X i � ¯ X ) 2 . Lecture 3. Su ffi ciency 9 (1–14)
3. Su ffi ciency 3.2. Minimal su ffi cient statistics Notes Example 3.3 has a vector T su ffi cient for a vector θ . Dimensions do not have to the same: e.g. for N ( µ, µ 2 ), T ( X ) = i X 2 �P i , P � i X i is minimal su ffi cient for µ [check] If the range of X depends on θ , then ” f X ( x ; θ ) / f X ( y ; θ ) is constant in θ ” means ” f X ( x ; θ ) = c ( x , y ) f X ( y ; θ )” Lecture 3. Su ffi ciency 10 (1–14)
3. Su ffi ciency 3.3. The Rao–Blackwell Theorem The Rao–Blackwell Theorem The Rao–Blackwell theorem gives a way to improve estimators in the mse sense. Theorem 3.5 (The Rao–Blackwell theorem) Let T be a su ffi cient statistic for θ and let ˜ θ be an estimator for θ with E (˜ θ 2 ) < 1 for all θ . Let ˆ ⇥ ˜ ⇤ θ = E θ | T . Then for all θ , (ˆ (˜ θ � θ ) 2 ⇤ θ � θ ) 2 ⇤ ⇥ ⇥ E . E The inequality is strict unless ˜ θ is a function of T. Proof By the conditional expectation formula we have E ˆ E (˜ = E ˜ ⇥ ⇤ θ = E θ | T ) θ , so θ and ˜ ˆ θ have the same bias. By the conditional variance formula, var(˜ var(˜ E (˜ var(˜ + var(ˆ ⇥ ⇤ ⇥ ⇤ ⇥ ⇤ θ ) = E θ | T ) + var θ | T ) = E θ | T ) θ ) . Hence var(˜ θ ) � var(ˆ θ ), and so mse(˜ θ ) � mse(ˆ θ ), with equality only if var(˜ θ | T ) = 0. ⇤ Lecture 3. Su ffi ciency 11 (1–14)
3. Su ffi ciency 3.3. The Rao–Blackwell Theorem Notes (i) Since T is su ffi cient for θ , the conditional distribution of X given T = t does not depend on θ . Hence ˆ ⇥ ˜ ⇤ θ = E θ ( X ) | T does not depend on θ , and so is a bona fide estimator. (ii) The theorem says that given any estimator, we can find one that is a function of a su ffi cient statistic that is at least as good in terms of mean squared error of estimation. (iii) If ˜ θ is unbiased, then so is ˆ θ . (iv) If ˜ θ is already a function of T , then ˆ θ = ˜ θ . Lecture 3. Su ffi ciency 12 (1–14)
Recommend
More recommend