On the asymptotics of the m.l. – estimators Matematiikan p¨ aiv¨ at 4.-5.1. 2006, Tampere Esko Valkeila Teknillinen korkeakoulu 4.1.2006
Outline of the talk ◮ Motivation
Outline of the talk ◮ Motivation ◮ Some technical facts
Outline of the talk ◮ Motivation ◮ Some technical facts ◮ One result of Le Cam
Outline of the talk ◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters
Outline of the talk ◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models
Outline of the talk ◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models ◮ Examples
Outline of the talk ◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models ◮ Examples ◮ Conclusions
Motivation Basic setup We work with statistical models/experiments E n (Θ) := (Ω n , F n , P θ n ; θ ∈ Θ); here (Ω n , F n ) is a model for the observation scheme, and P θ n , θ ∈ Θ is a model for different statistical theories concerning the observations. We are interested in asymptotics: what happens if n → ∞ ? More precisely, would like to understand what are the minimal assumptions to quarantee that the maximum likelihood estimator is asymptotically normal, efficient . . .
Motivation Text book information How these problems are treated in the text books of Statistics? I would like to describe the situation as follows: ◮ Cook books on Statistics: Under some regularity assumptions the m.l.e. is asymptotically normal. Typically, no proof is given.
Motivation Text book information How these problems are treated in the text books of Statistics? I would like to describe the situation as follows: ◮ Cook books on Statistics: Under some regularity assumptions the m.l.e. is asymptotically normal. Typically, no proof is given. ◮ Main stream books on Statistics: The log-likelihood is smooth (in C 2 in the neighbourhood of the true parameter) , some domination on the remainder term, and the support of the true distribution does not depend on the parameter. A detailed proof is given.
Motivation Text book information, cont. Next I will make some comments: ◮ If we want to understand the minimal conditions for the good properties of the m.l.e. to be valid, we should forget cook books on Statistics.
Motivation Text book information, cont. Next I will make some comments: ◮ If we want to understand the minimal conditions for the good properties of the m.l.e. to be valid, we should forget cook books on Statistics. ◮ Main stream books on Statistics are inaccurate: the support can depend on the parameter, if the dependency is smooth (take f ( x ; θ ) = ( x − θ ) e − ( x − θ ) 1 { x ≥ θ } ).
Some technical facts L 2 - differentiability The following definition will be very useful. We work with statistical model/experiment (Ω , F , P θ ; θ ∈ Θ) with the following additional property: there exists a probability measure Q such that P θ ≺≺ Q for all θ ∈ Θ. Put f θ := dP θ dQ . Notation: Θ ⊂ R d , ( u , v ) is the inner product in R d . The model is differentiable in L 2 , if there exists a random variable w θ, 2 ∈ L 2 ( Q ) such that for all u n → 0 we have � √ √ � 2 f θ + u n − f θ − ( u n , w θ, 2 ) E Q → 0 | u n | as n → ∞ .
Some technical facts L q - differentiability We formally generalize the L 2 -differentiability: take q > 2 The model is differentiable in L q , if there exists a random variable w θ, q ∈ L q ( Q ) such that for all u n → 0 we have √ √ q � � f θ + u n − q q f θ � � − ( u n , w θ, q ) → 0 E Q � � � | u n | � � � as n → ∞ .
Some technical facts Score and L q - differentiability To simplify the discussion, we assume that P θ ∼ Q and put L η,θ = f η f θ ; L η,θ is the likelihood. One can show the following: f θ + u n − f θ � � � − ( u n , w θ, 1 ) � E Q � → 0 � � | u n | � with some random variable w θ, 1 ∈ L 1 ( Q ), if and only if E P θ | L θ + u n ,θ − 1 − ( u n , v θ ) | → 0 | u n | with some random variable v θ ∈ L 1 ( P θ ). The vector v θ is the score vector.
Some technical facts Score and L q - differentiability, cont. Moreover, the model is differentiable in L q if and only if √ q � L θ + u n ,θ − 1 � q − 1 � � q ( u n , v θ ) E P θ → 0 . � � � | u n | � � � √ √ Hence w θ, 2 = 1 f θ v θ , w θ, q = 1 f θ v θ ; for L 2 - differentiable the q 2 q Fisher information matrix I ( θ ) automatically exists and I ij ( θ ) = E P θ v θ i v θ j .
One result of LeCam Consider next the case when the statistical model E n (Θ) is a product experiment E n (Θ) = (Ω n , ⊗ n k =1 F , P θ n ; θ ∈ Θ); n = � n here P θ n is the product measure P θ k =1 P θ . One can show that the experiment E n (Θ) is L q - differentiable if and only if the coordinate experiment e (Θ) = (Ω , F , P θ ; θ ∈ Θ) is L q - differentiable. Let ˆ θ n be the m.l.- estimator of the parameter θ in the product experiment, i.e. the m.l.- estimator based on n independent and identical observations from the model (Ω , F , P θ ; θ ∈ Θ).
One result of LeCam, cont. We can now formulate the result of LeCam for the product experiments. Assume that ◮ Θ ⊂ R is open and bounded.
One result of LeCam, cont. We can now formulate the result of LeCam for the product experiments. Assume that ◮ Θ ⊂ R is open and bounded. ◮ The model (Ω , F , P θ ; θ ∈ Θ) is L 2 differentiable.
One result of LeCam, cont. We can now formulate the result of LeCam for the product experiments. Assume that ◮ Θ ⊂ R is open and bounded. ◮ The model (Ω , F , P θ ; θ ∈ Θ) is L 2 differentiable. ◮ 0 < inf θ I ( θ ) and sup θ I ( θ ) < ∞ and the map θ �→ I ( θ ) is continuous.
One result of LeCam, cont. We can now formulate the result of LeCam for the product experiments. Assume that ◮ Θ ⊂ R is open and bounded. ◮ The model (Ω , F , P θ ; θ ∈ Θ) is L 2 differentiable. ◮ 0 < inf θ I ( θ ) and sup θ I ( θ ) < ∞ and the map θ �→ I ( θ ) is continuous. ◮ Then the following facts hold:
One result of LeCam, cont. We can now formulate the result of LeCam for the product experiments. Assume that ◮ Θ ⊂ R is open and bounded. ◮ The model (Ω , F , P θ ; θ ∈ Θ) is L 2 differentiable. ◮ 0 < inf θ I ( θ ) and sup θ I ( θ ) < ∞ and the map θ �→ I ( θ ) is continuous. ◮ Then the following facts hold: ◮ There exists maximum likelihood estimators ˆ θ n .
One result of LeCam, cont. We can now formulate the result of LeCam for the product experiments. Assume that ◮ Θ ⊂ R is open and bounded. ◮ The model (Ω , F , P θ ; θ ∈ Θ) is L 2 differentiable. ◮ 0 < inf θ I ( θ ) and sup θ I ( θ ) < ∞ and the map θ �→ I ( θ ) is continuous. ◮ Then the following facts hold: ◮ There exists maximum likelihood estimators ˆ θ n . ◮ The sequence √ n (ˆ θ n − θ ) is asymptotically normal under P θ 1 with the limit N (0 , I ( θ ) ).
One result of LeCam, discussion Essentially the good properties of the m.l.e. follow from the L 2 - differentiability, when the parameter is one-dimensional. In the main stream text books the proof is based on Taylor expansion with two terms and the correction term. This is not possible here, because we do not have a Taylor expansion with two terms, but with one term only. In the proof one must control the terms | L θ + u ,θ sup − 1 | , n | u |≤ δ by using the Kolmogorov criteria for modulus of continuity; here L θ + u ,θ is the likelihood in the product experiment. If the parameter n is one-dimensional, then L 2 differentiability is sufficient for the control we are looking for.
Multi dimensional parameters Assume now that Θ ⊂ R d , where d ≥ 2. We still assume that the model is L 2 – differentiable, the parameter set Θ is an open and bounded subset of R d , Fisher information is continuous, strictly non-degenerate, and the score vector v θ satisfies v θ ∈ L q ( P θ ) with some q > d . Then ◮ There exists maximum likelihood estimators ˆ θ n . ◮ The sequence √ n (ˆ θ n − θ ) is asymptotically normal under P θ with the limit N (0 , I ( θ ) − 1 ).
Multi dimensional parameters, discussion As explained earlier, the main problem with this approach is to control the expression | L θ + u ,θ sup − 1 | n | u |≤ δ or equivalently � L θ + u ,θ q sup | − 1 | . n | u |≤ δ If v θ ∈ L q ( P θ ), then the experiment is also L q - differentiable, and this makes the desired control possible. The proof of these results is essentially in Ibragimov and Has’minskii, but the role of L q – differentiability in their arguments is missing.
General observation schemes Filtered experiments We work now with filtered models: (Ω , F , F , P θ ; θ ∈ Θ). Here F = ( F t ) 0 ≤ t ≤ T is an increasing family of sigma-fields, so called filtration. Assume that P θ ∼ Q and define the density processes by t = dP θ z θ t ; dQ t here P θ t = P θ | F t ( Q t = Q | F t ). We have the following for free: density processes z θ are ( F , Q )- martingales.
Recommend
More recommend