18.650 Statistics for Applications Chapter 4: The Method of Moments 1/14
Weierstrass Approximation Theorem (WAT) Theorem f be [ a, b ] , Let a continuous function on the interval then, for any ε > 0 , there exists a 0 , a 1 , . . . , a d ∈ I R such that d � max � f ( x ) − � a k x k � < ε . � x ∈ [ a,b ] k =0 In word: “continuous functions can be arbitrarily well approximated by polynomials” 2/14
Statistical application of the WAT (1) ◮ Let X 1 , . . . , X n be an i.i.d. sample associated with a model E, { I ( ) θ ∗ for (identified) statistical P θ } . Write the θ ∈ Θ true parameter. ◮ Assume that for all θ , the distribution I P θ has a density f θ . θ such ◮ If we find that h ( x ) f θ ∗ ( x ) dx = h ( x ) f θ ( x ) dx θ = θ ∗ . for all (bounded continuous) functions h , then ˆ such ◮ Replace expectations by averages: find estimator that θ n 1 � h ( x ) f ˆ ( x ) dx h ( X i ) = n i =1 θ all (bounded continuous) functions h . There for is an infinity of such functions: not doable! 3/14
Statistical application of the WAT (2) ◮ By the WAT, it is enough to consider polynomials: n d d 1 � � � a k X k k ˆ ( x ) dx , i = ∀ a 0 , . . . , a d ∈ I R a k x f θ n i =1 k =0 k =0 Still an infinity of equations! ◮ In turn, enough to consider n 1 � X k = x f ˆ ( x ) dx , k ∀ k = 1 , . . . , d i n i =1 θ (only d + 1 equations) k x f θ ( x ) dx is k th moment of ◮ The quantity m k ( θ ) := the I P θ . Can also be written as E θ [ X k ] . m k ( θ ) = I 4/14
Gaussian quadrature (1) ◮ The Weierstrass approximation theorem has limitations: 1. works only for continuous functions (not really a problem!) [ a, b ] 2. works only on intervals d (# of 3. Does not tell us what moments) should be E is ◮ What if discrete: no PDF but PMF p ( · ) ? E = { x 1 , x 2 , . . . , x r } is r possible ◮ Assume that finite with r − 1 parameters: values. The PMF has p ( x 1 ) , . . . , p ( x r − 1 ) r − 1 � because the last one: p ( x r ) = 1 − p ( x j ) is given by the j =1 r − 1 . first d = r − 1 ◮ Hopefully, we do not need much more than moments to recover the PMF p ( · ) . 5/14
Gaussian quadrature (2) k = 1 , . . . , r 1 , ◮ Note that for any r � E[ X k ] = k m k = I p ( x j ) x j j =1 and r � p ( x j ) = 1 j =1 system of linear equations with This is a unknowns p ( x 1 ) , . . . , p ( x r ) . ◮ We can write it in a compact form: x 1 x 1 x 1 · · · p ( x 1 ) m 1 1 2 r x 2 x 2 x 2 · · · p ( x 2 ) m 2 1 2 r . . . . . . . . . . · = . . . . . r − 1 r − 1 x x x r − 1 · · · p ( x r − 1 ) m r − 1 1 2 r 1 1 · · · 1 p ( x r ) 1 6/14
Gaussian quadrature (2) ◮ Check if matrix is invertible: Vandermonde determinant 1 1 1 x x · · · x 1 2 r 2 2 2 · · · x x x 1 2 r . . . . . . det = ( x j − x k ) = 0 . . . r − 1 r − 1 r − 1 x 1 <j<k<r · · · x x 1 2 r · · · 1 1 1 ◮ So given m 1 , . . . , m r − 1 , there is a PMF that has these unique moments. It is given by − 1 x 1 x 1 x 1 · · · p ( x 1 ) m 1 1 2 r x 2 x 2 x 2 p ( x 2 ) · · · m 2 1 2 r . . . . . . . . . . = . . . . . r − 1 r − 1 x x x r − 1 · · · p ( x r − 1 ) m r − 1 1 2 r p ( x r ) 1 1 · · · 1 1 7/14
Conclusion from WAT and Gaussian quadrature ◮ Moments contain important information to recover the PDF or the PMF ◮ If we can estimate these moments accurately, we may be able to recover the distribution ◮ In a parametric setting, where knowing the distribution I P θ amounts to knowing θ , it is often the case that even less moments are needed to recover θ . This is on a case-by-case basis. θ ∈ Θ ⊂ I R d , d moments. ◮ Rule of thumb if we need 8/14
Method of moments (1) Let X 1 , . . . , X n be an i.i.d. sample associated with a statistical model E, (I R d , d ≥ 1 . ( ) Θ ⊆ I P θ ) . Assume that for some θ ∈ Θ ◮ Population moments : Let k ] , 1 ≤ k ≤ d . m k ( θ ) = I E θ [ X 1 n ˆ k = X k = 1 � k ◮ Empirical moments : Let , 1 ≤ k ≤ d . m X i n n i =1 ◮ Let ψ : R d → R d Θ ⊂ I I ( m 1 ( θ ) , . . . , m d ( θ )) . → θ 9/14
Method of moments (2) ψ is Assume one to one: θ = ψ − 1 ( m 1 ( θ ) , . . . , m d ( θ )) . Definition Moments estimator of θ : ˆ θ MM = ψ − 1 ( ˆ m 1 , . . . , m ˆ d ) , n provided it exists. 10/14
Method of moments (3) Analysis of ˆ θ MM n ◮ Let M ( θ ) = ( m 1 ( θ ) , . . . , m d ( θ )) ; ˆ = ( ˆ m 1 , . . . , m ◮ Let M ˆ d ) . Σ( θ ) = V θ ( X, X 2 , . . . , X d ) be ◮ Let the covariance matrix of vector ( X, X 2 , . . . , X d ) , X ∼ I the random where P θ . ψ − 1 is ◮ Assume continuously differentiable at M ( θ ) . Write ∇ ψ − 1 � d × d gradient M ( θ ) for the matrix at this point. � 11/14
Method of moments (4) ◮ LLN: ˆ θ MM is weakly/strongly consistent. n ◮ CLT: √ ( d ) ( ) ˆ n M − M ( θ ) → N (0 , Σ( θ )) P θ ) . − − − ( w.r.t. I n →∞ Hence, by the Delta method (see next slide): Theorem √ ( d ) ( ) ˆ MM n − θ − → N (0 , Γ( θ )) − − ( w.r.t. I P θ ) , θ n n →∞ � ⊤ Γ( θ ) = ∇ ψ − 1 Σ( θ ) ∇ ψ − 1 � � � � � where . � � M ( θ ) M ( θ ) 12/14
Multivariate Delta method R p ( p ≥ 1 ) that Let ( T n ) n ≥ 1 sequence of random vectors in I satisfies √ ( d ) → N (0 , Σ) , n ( T n − θ ) − − − n →∞ θ ∈ I R p and for some some symmetric positive semidefinite matrix R p × p . Σ ∈ I g : I R p → I R k ( k ≥ 1 ) be Let continuously differentiable at θ . Then, √ ( d ) n ( g ( T n ) − g ( θ )) − → N (0 , ∇ g ( θ ) ⊤ Σ ∇ g ( θ )) , − − n →∞ ∂g j R k × d where ∇ g ( θ ) = ∈ I . ∂θ i 1 ≤ i ≤ d, 1 ≤ j ≤ k 13/14
MLE vs. Moment estimator ◮ Comparison of the quadratic risks: In general, the MLE is more accurate. ◮ Computational issues: Sometimes, the MLE is intractable. ◮ If likelihood is concave, we can use optimization algorithms (Interior point method, gradient descent, etc.) ◮ If likelihood is not concave: only heuristics. Local maxima. (Expectation-Maximization, etc.) 14/14
MIT OpenCourseWare https://ocw.mit.edu 18.650 / 18.6501 Statistics for Applications Fall 2016 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.
Recommend
More recommend