Empirical Process Theory for Statistics Jon A. Wellner University of Washington, Seattle Talk to be given at School of Statistics and Management Science Shanghai University of Finance and Economics Shanghai, China 23 June 2015
Talk, Shanghai; School of Statistics and Management Science • Lecture Outline: ⊲ 1. Introduction, history, selected examples. ⊲ 2. Some basic inequalities and Glivenko-Cantelli theorems. ⊲ 3. Using the Glivenko-Cantelli theorems: first applications. ⊲ 4. Donsker theorems and some inequalities. ⊲ 5. Peeling methods and rates of convergence. ⊲ 6. Some useful preservation theorems. Talk, Shanghai; 23 June 2015 1.1
Based on Courses given at Torgnon, Cortona, and Delft (2003-2005). Notes available at: http://www.stat.washington.edu/jaw/ RESEARCH/TALKS/talks.html Talk, Shanghai; 23 June 2015 1.2
Part I: Introduction, history, selected examples • 1. Classical empirical processes • 2. Modern empirical processes • 3. Some examples Talk, Shanghai; 23 June 2015 1.3
1. Classical empirical processes. Suppose that: • X 1 , . . . , X n are i.i.d. with d.f. F on R . • F n ( x ) = n − 1 � n i =1 1 [ X i ≤ x ] , the empirical distribution function. • { Z n ( x ) ≡ √ n ( F n ( x ) − F ( x )) : x ∈ R } , the empirical process. Two classical theorems: Theorem 1. (Glivenko-Cantelli, 1933). � F n − F � ∞ ≡ −∞ <x< ∞ | F n ( x ) − F ( x ) | → a.s. 0 . sup Theorem 2. (Donsker, 1952). Z n ⇒ Z ≡ U ( F ) in D ( R , � · � ∞ ) Talk, Shanghai; 23 June 2015 1.4
where U is a standard Brownian bridge process on [0 , 1]; i.e. U is a zero-mean Gaussian process with covariance E ( U ( s ) U ( t )) = s ∧ t − st, s, t ∈ [0 , 1] . This means that we have Eg ( Z n ) → Eg ( Z ) for any bounded, continuous function g : D ( R , � · � ∞ ) → R and g ( Z n ) → d g ( Z ) for any continuous function g : D ( R , � · � ∞ ) → R (ignoring measurability issues). Talk, Shanghai; 23 June 2015 1.5
2. General empirical processes (indexed by functions) Suppose that: • X 1 , . . . , X n are i.i.d. with probability measure P on ( X , A ). • P n = n − 1 � n i =1 δ X i , the empirical measure; here � 1 , x ∈ A, δ x ( A ) = 1 A ( x ) = for A ∈ A . x ∈ A c 0 , Hence we have n n � � P n ( A ) = n − 1 P n ( f ) = n − 1 1 A ( X i ) , and f ( X i ) . i =1 i =1 • { G n ( f ) ≡ √ n ( P n ( f ) − P ( f )) : f ∈ F ⊂ L 2 ( P ) } , the empirical process indexed by F Talk, Shanghai; 23 June 2015 1.6
Note that the classical case corresponds to: • ( X , A ) = ( R , B ). • F = { 1 ( −∞ ,t ] ( · ) : t ∈ R } . Then n � P n (1 ( −∞ ,t ] ) = n − 1 1 ( −∞ ,t ] ( X i ) = F n ( t ) , i =1 P (1 ( −∞ ,t ] ) = F ( t ) , G n (1 ( −∞ ,t ] ) = √ n ( P n − P )(1 ( −∞ ,t ] = √ n ( F n ( t ) − F ( t )) G (1 ( −∞ ,t ] ) = U ( F ( t )) . Talk, Shanghai; 23 June 2015 1.7
Two central questions for the general theory: A. For what classes of functions F does a natural generalization of the Glivenko-Cantelli theorem hold? That is, for what classes F do we have � P n − P � ∗ F → a.s. 0 If this convergence holds, then we say that F is a P − Glivenko- Cantelli class of functions. B. For what classes of functions F does a natural generalization of Donsker’s theorem hold? That is, for what classes F do we have ℓ ∞ ( F )? G n ⇒ G P in If this convergence holds, then we say that F is a P − Donsker class of functions. Talk, Shanghai; 23 June 2015 1.8
Here G P is a 0 − mean P − Brownian bridge process with uniformly- continuous sample paths with respect to the semi-metric ρ P ( f, g ) defined by ρ 2 P ( f, g ) = V ar P ( f ( X ) − g ( X )) , ℓ ∞ ( F ) is the space of all bounded, real-valued functions z from F to R : � � � � � ℓ ∞ ( F ) = z : F �→ R � � z � F ≡ sup | z ( f ) | < ∞ , � f ∈F and E { G P ( f ) G P ( g ) } = P ( fg ) − P ( f ) P ( g ) . Talk, Shanghai; 23 June 2015 1.9
3. Some Examples A commonly occurring problem in statistics: we want to prove consistency or asymptotic normality of some statistic which is not a sum of independent random variables, but which can be related to a natural sum of random functions indexed by a parameter in a suitable (metric) space. Example 1. Suppose that X 1 , . . . , X n are i.i.d. real-valued with E | X 1 | < ∞ , and let µ = E ( X 1 ). Consider the absolute deviations about the sample mean, n � D n = P n | X − X n | = n − 1 | X i − X n | . i =1 Since X n → a.s. µ , we know that for any δ > 0 we have X ∈ [ µ − δ, µ + δ ] for all sufficiently large n almost surely. Thus we see that if we define n � D n ( t ) ≡ P n | x − t | = n − 1 | X i − t | , i =1 Talk, Shanghai; 23 June 2015 1.10
then D n = D n ( X n ) and study of D n ( t ) for t ∈ [ µ − δ, µ + δ ] is equivalent to study of the empirical measure P n indexed by the class of functions F δ = { x �→ | x − t | ≡ f t ( x ) : t ∈ [ µ − δ, µ + δ ] } . To show that D n → a.s. d ≡ E | X − µ | , we write D n − d = P n | X − X n | − P | X − µ | (1) = ( P n − P )( | X − X n | ) + P | X − X n | − P | X − µ | ≡ I n + II n . (2) Now | I n | = | ( P n − P )( | X − X n | ) | ≤ sup | ( P n − P ) | X − t || = sup | ( P n − P )( f ) | f ∈F δ t : | t − µ |≤ δ → a.s. 0 (3) if F δ is P − Glivenko-Cantelli. Talk, Shanghai; 23 June 2015 1.11
But convergence of the second term in (2) is easy: by the triangle inequality II n = | P | X − X n | − P | X − µ || ≤ P | X n − µ | = | X n − µ | → a.s. 0 . How to prove (3)? Consider the functions f 1 , . . . , f m ∈ F δ given by f j ( x ) = | x − ( µ − δ (1 − j/m ) | , j = 0 , . . . , 2 m. For this finite set of functions we have 0 ≤ j ≤ 2 m | ( P n − P )( f j ) | → a.s. 0 max by the strong law of large numbers applied 2 m + 1 times. Furthermore ... Talk, Shanghai; 23 June 2015 1.12
it follows that for t ∈ [ µ − δ (1 − j/m ) , µ − δ (1 − ( j + 1) /m )] the functions f t ( x ) = | x − t | satisfy (picture!) L j ( x ) ≡ f j/m ( x ) ∧ f ( j +1) /m ( x ) ≤ f t ( x ) ≤ f j/m ( x ) ∨ f ( j +1) /m ( x ) ≡ U j ( x ) where U j ( x ) − f t ( x ) ≤ 1 f t ( x ) − L j ( x ) ≤ 1 U j ( x ) − L j ( x ) ≤ 1 m, m, m. Thus for each m � P n − P � F δ ≡ sup | ( P n − P )( f ) | f ∈F δ � � ≤ max 0 ≤ j ≤ 2 m | ( P n − P )( U j ) | , max 0 ≤ j ≤ 2 m | ( P n − P )( L j ) | max + 1 /m → a.s. 0 + 1 /m Taking m large shows that (3) holds. Talk, Shanghai; 23 June 2015 1.13
This is a bracketing argument, and generalizes easily to yield a quite general bracketing Glivenko-Cantelli theorem. How to prove √ n ( D n − d ) → d ? We write √ n ( D n − d ) √ n ( P n | X − X n | − P | X − µ | ) = √ n ( P n | X − µ | − P | X − µ | ) = + √ n ( P | X − X n | − P | X − µ | ) + √ n ( P n − P )( | X − X n | ) − √ n ( P n − P )( | X − µ | ) G n ( | X − µ | ) + √ n ( H ( X n ) − H ( µ )) = + G n ( | X − X n | − | X − µ | ) G n ( | X − µ | ) + H ′ ( µ )( X n − µ ) = + √ n ( H ( X n ) − H ( µ ) − H ′ ( µ )( X n − µ )) + G n ( | X − X n | − | X − µ | ) G n ( | X − µ | + H ′ ( µ )( X − µ )) + I n + II n ≡ where ... Talk, Shanghai; 23 June 2015 1.14
H ( t ) ≡ P | X − t | , √ n ( H ( X n ) − H ( µ ) − H ′ ( µ )( X n − µ )) , I n ≡ II n ≡ G n ( | X − X n | ) − G n ( | X − µ | ) = G n ( | X − X n | − | X − µ | ) = G n ( f X n − f µ ) . Here I n → p 0 if H ( t ) ≡ P | X − t | is differentiable at µ . The second term II n ≡ G n ( f X n − f µ ) → p 0 if F δ is a Donsker class of functions! This is a consequence of asymptotic equicontinuity of G n over the class F : for every ǫ > 0 n →∞ Pr ∗ ( δ ց 0 lim sup lim sup | G n ( f ) − G n ( g ) | > ǫ ) = 0 . f,g : ρ P ( f,g ) ≤ δ Talk, Shanghai; 23 June 2015 1.15
Example 2. Copula models: the pseudo-MLE. Let c θ ( u 1 , . . . , u p ) be a copula density with θ ⊂ Θ ⊂ R q . Suppose that X 1 , . . . , X n are i.i.d. with density f ( x 1 , . . . , x p ) = c θ ( F 1 ( x 1 ) , . . . , F p ( x p )) · f 1 ( x 1 ) · · · f p ( x p ) where F 1 , . . . , F p are absolutely continuous d.f.’s with densities f 1 , . . . , f p . Let n � F n,j ( x j ) ≡ n − 1 1 { X i,j ≤ x j } , j = 1 , . . . , p i =1 be the marginal empirical d.f.’s of the data. Then a natural pseudo-likelihood function is given by l n ( θ ) ≡ P n log c θ ( F n, 1 ( x 1 ) , . . . , F n,p ( x p )) . Talk, Shanghai; 23 June 2015 1.16
Thus it seems reasonable to define the pseudo-likelihood esti- mator � θ n of θ by the q − dimensional system of equations Ψ n ( � θ n ) = 0 where Ψ n ( θ ) ≡ P n ( ˙ ℓ θ ( θ ; F n, 1 ( x 1 ) , . . . , F n,p ( x p )) and where ˙ ℓ θ ( θ ; u 1 , . . . , u p ) ≡ ∇ θ log c θ ( u 1 , . . . , u p ) . We also define Ψ( θ ) by Ψ( θ ) ≡ P 0 ( ˙ ℓ θ ( θ, F 1 ( x 1 ) , . . . , F p ( x p )) . Talk, Shanghai; 23 June 2015 1.17
Recommend
More recommend