18.650 Statistics for Applications Chapter 4: The Method of - PowerPoint PPT Presentation

18.650 Statistics for Applications Chapter 4: The Method of Moments 1/14

Weierstrass Approximation Theorem (WAT) Theorem f be [ a, b ] , Let a continuous function on the interval then, for any ε > 0 , there exists a 0 , a 1 , . . . , a d ∈ I R such that d � max � f ( x ) − � a k x k � < ε . � x ∈ [ a,b ] k =0 In word: “continuous functions can be arbitrarily well approximated by polynomials” 2/14

Statistical application of the WAT (1) ◮ Let X 1 , . . . , X n be an i.i.d. sample associated with a model E, { I ( ) θ ∗ for (identified) statistical P θ } . Write the θ ∈ Θ true parameter. ◮ Assume that for all θ , the distribution I P θ has a density f θ . θ such ◮ If we find that h ( x ) f θ ∗ ( x ) dx = h ( x ) f θ ( x ) dx θ = θ ∗ . for all (bounded continuous) functions h , then ˆ such ◮ Replace expectations by averages: find estimator that θ n 1 � h ( x ) f ˆ ( x ) dx h ( X i ) = n i =1 θ all (bounded continuous) functions h . There for is an infinity of such functions: not doable! 3/14

Statistical application of the WAT (2) ◮ By the WAT, it is enough to consider polynomials: n d d 1 � � � a k X k k ˆ ( x ) dx , i = ∀ a 0 , . . . , a d ∈ I R a k x f θ n i =1 k =0 k =0 Still an infinity of equations! ◮ In turn, enough to consider n 1 � X k = x f ˆ ( x ) dx , k ∀ k = 1 , . . . , d i n i =1 θ (only d + 1 equations) k x f θ ( x ) dx is k th moment of ◮ The quantity m k ( θ ) := the I P θ . Can also be written as E θ [ X k ] . m k ( θ ) = I 4/14

Gaussian quadrature (1) ◮ The Weierstrass approximation theorem has limitations: 1. works only for continuous functions (not really a problem!) [ a, b ] 2. works only on intervals d (# of 3. Does not tell us what moments) should be E is ◮ What if discrete: no PDF but PMF p ( · ) ? E = { x 1 , x 2 , . . . , x r } is r possible ◮ Assume that finite with r − 1 parameters: values. The PMF has p ( x 1 ) , . . . , p ( x r − 1 ) r − 1 � because the last one: p ( x r ) = 1 − p ( x j ) is given by the j =1 r − 1 . first d = r − 1 ◮ Hopefully, we do not need much more than moments to recover the PMF p ( · ) . 5/14

Gaussian quadrature (2) k = 1 , . . . , r 1 , ◮ Note that for any r � E[ X k ] = k m k = I p ( x j ) x j j =1 and r � p ( x j ) = 1 j =1 system of linear equations with This is a unknowns p ( x 1 ) , . . . , p ( x r ) . ◮ We can write it in a compact form: x 1 x 1 x 1  · · ·   p ( x 1 )    m 1 1 2 r x 2 x 2 x 2 · · · p ( x 2 ) m 2   1 2 r      . .   .   .  . . . . . .  ·   =   .    . . . .     r − 1 r − 1   x x x r − 1       · · ·  p ( x r − 1 )  m r − 1 1 2 r   1 1 · · · 1 p ( x r ) 1 6/14

Gaussian quadrature (2) ◮ Check if matrix is invertible: Vandermonde determinant 1 1 1   x x · · · x 1 2 r 2 2 2  · · · x x x  1 2 r  . .  . . . . det  = ( x j − x k ) = 0  .  . .  r − 1 r − 1 r − 1  x  1 <j<k<r · · ·  x x 1 2 r  · · · 1 1 1 ◮ So given m 1 , . . . , m r − 1 , there is a PMF that has these unique moments. It is given by  − 1 x 1 x 1 x 1    · · ·   p ( x 1 ) m 1 1 2 r x 2 x 2 x 2 p ( x 2 ) · · · m 2     1 2 r    .   . .   .  . . . . . .  =    .    . . . .      r − 1 r − 1     x x x r − 1    · · ·  p ( x r − 1 )  m r − 1  1 2 r   p ( x r ) 1 1 · · · 1 1 7/14

Conclusion from WAT and Gaussian quadrature ◮ Moments contain important information to recover the PDF or the PMF ◮ If we can estimate these moments accurately, we may be able to recover the distribution ◮ In a parametric setting, where knowing the distribution I P θ amounts to knowing θ , it is often the case that even less moments are needed to recover θ . This is on a case-by-case basis. θ ∈ Θ ⊂ I R d , d moments. ◮ Rule of thumb if we need 8/14

Method of moments (1) Let X 1 , . . . , X n be an i.i.d. sample associated with a statistical model E, (I R d , d ≥ 1 . ( ) Θ ⊆ I P θ ) . Assume that for some θ ∈ Θ ◮ Population moments : Let k ] , 1 ≤ k ≤ d . m k ( θ ) = I E θ [ X 1 n ˆ k = X k = 1 � k ◮ Empirical moments : Let , 1 ≤ k ≤ d . m X i n n i =1 ◮ Let ψ : R d → R d Θ ⊂ I I ( m 1 ( θ ) , . . . , m d ( θ )) . → θ 9/14

Method of moments (2) ψ is Assume one to one: θ = ψ − 1 ( m 1 ( θ ) , . . . , m d ( θ )) . Definition Moments estimator of θ : ˆ θ MM = ψ − 1 ( ˆ m 1 , . . . , m ˆ d ) , n provided it exists. 10/14

Method of moments (3) Analysis of ˆ θ MM n ◮ Let M ( θ ) = ( m 1 ( θ ) , . . . , m d ( θ )) ; ˆ = ( ˆ m 1 , . . . , m ◮ Let M ˆ d ) . Σ( θ ) = V θ ( X, X 2 , . . . , X d ) be ◮ Let the covariance matrix of vector ( X, X 2 , . . . , X d ) , X ∼ I the random where P θ . ψ − 1 is ◮ Assume continuously differentiable at M ( θ ) . Write ∇ ψ − 1 � d × d gradient M ( θ ) for the matrix at this point. � 11/14

Method of moments (4) ◮ LLN: ˆ θ MM is weakly/strongly consistent. n ◮ CLT: √ ( d ) ( ) ˆ n M − M ( θ ) → N (0 , Σ( θ )) P θ ) . − − − ( w.r.t. I n →∞ Hence, by the Delta method (see next slide): Theorem √ ( d ) ( ) ˆ MM n − θ − → N (0 , Γ( θ )) − − ( w.r.t. I P θ ) , θ n n →∞ � ⊤ Γ( θ ) = ∇ ψ − 1 Σ( θ ) ∇ ψ − 1 � � � � � where . � � M ( θ ) M ( θ ) 12/14

Multivariate Delta method R p ( p ≥ 1 ) that Let ( T n ) n ≥ 1 sequence of random vectors in I satisfies √ ( d ) → N (0 , Σ) , n ( T n − θ ) − − − n →∞ θ ∈ I R p and for some some symmetric positive semidefinite matrix R p × p . Σ ∈ I g : I R p → I R k ( k ≥ 1 ) be Let continuously differentiable at θ . Then, √ ( d ) n ( g ( T n ) − g ( θ )) − → N (0 , ∇ g ( θ ) ⊤ Σ ∇ g ( θ )) , − − n →∞ ∂g j R k × d where ∇ g ( θ ) = ∈ I . ∂θ i 1 ≤ i ≤ d, 1 ≤ j ≤ k 13/14

MLE vs. Moment estimator ◮ Comparison of the quadratic risks: In general, the MLE is more accurate. ◮ Computational issues: Sometimes, the MLE is intractable. ◮ If likelihood is concave, we can use optimization algorithms (Interior point method, gradient descent, etc.) ◮ If likelihood is not concave: only heuristics. Local maxima. (Expectation-Maximization, etc.) 14/14

MIT OpenCourseWare https://ocw.mit.edu 18.650 / 18.6501 Statistics for Applications Fall 2016 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

18.650 Statistics for Applications Chapter 4: The Method of - PowerPoint PPT Presentation

18.650 Statistics for Applications Chapter 4: The Method of Moments 1/14 Weierstrass Approximation Theorem (WAT) Theorem f be [ a, b ] , Let a continuous function on the interval then, for any > 0 , there exists a 0 , a

Chapter 2: Method of Alterations The Probabilistic Method Summer 2020 Freie Universitt Berlin

18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry

18.650 Statistics for Applications Chapter 3: Maximum Likelihood Estimation 1/23 Total

18.650 Statistics for Applications Chapter 1: Introduction 1/43 Goals Goals: To give you a

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Statistics I Chapter 3 Describing Data through Statistics Ling-Chieh Kung Department of

Statistics I Chapter 1 What is Statistics? Ling-Chieh Kung Department of Information

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Method Handles Everywhere! Charles Oliver Nutter @headius Method Handles What are method

B Method Proof assistants May 16, 2017 Lucas Franceschino What is B method? B-method goal

Newtons method Newtons method 1 / 8 Newtons method Objective: solving a non-linear

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Fast Algorithms Estimating Statistics . . . Applications to Radar . . . for Computing Statistics

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Discontinuous Galerkin method for hyperbolic equations with singularities Chi-Wang Shu Division

The Simplex Method Marco Chiarandini Department of Mathematics & Computer Science University

Dependence analysis underlies many tasks Motivation 2 Testing / Evolution / Performance

eigenvalues, markov matrices, and the power method Slides by Olson. Some taken loosely from Jeff

Logic of the Scientific Method s e d i l S 2 n o i s s e S - - 0 4 2 k W c

CS 287 Advanced Robotics (Fall 2019) Lecture 6: Unconstrained Optimization Pieter Abbeel UC

Functional Steins Institut method Mines-Telecom L. Decreusefond Borchard symposium Roadmap

Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices* Don Kurian

18.650 Statistics for Applications Chapter 4: The Method of - PowerPoint PPT Presentation

18.650 Statistics for Applications Chapter 4: The Method of Moments 1/14 Weierstrass Approximation Theorem (WAT) Theorem f be [ a, b ] , Let a continuous function on the interval then, for any > 0 , there exists a 0 , a

Chapter 2: Method of Alterations The Probabilistic Method Summer 2020 Freie Universitt Berlin

18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry

18.650 Statistics for Applications Chapter 3: Maximum Likelihood Estimation 1/23 Total

18.650 Statistics for Applications Chapter 1: Introduction 1/43 Goals Goals: To give you a

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Statistics I Chapter 3 Describing Data through Statistics Ling-Chieh Kung Department of

Statistics I Chapter 1 What is Statistics? Ling-Chieh Kung Department of Information

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Method Handles Everywhere! Charles Oliver Nutter @headius Method Handles What are method

B Method Proof assistants May 16, 2017 Lucas Franceschino What is B method? B-method goal

Newtons method Newtons method 1 / 8 Newtons method Objective: solving a non-linear

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Fast Algorithms Estimating Statistics . . . Applications to Radar . . . for Computing Statistics

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Discontinuous Galerkin method for hyperbolic equations with singularities Chi-Wang Shu Division

The Simplex Method Marco Chiarandini Department of Mathematics &amp; Computer Science University

Dependence analysis underlies many tasks Motivation 2 Testing / Evolution / Performance

eigenvalues, markov matrices, and the power method Slides by Olson. Some taken loosely from Jeff

Logic of the Scientific Method s e d i l S 2 n o i s s e S - - 0 4 2 k W c

CS 287 Advanced Robotics (Fall 2019) Lecture 6: Unconstrained Optimization Pieter Abbeel UC

Functional Steins Institut method Mines-Telecom L. Decreusefond Borchard symposium Roadmap

Shallow RNNs: A Method for Accurate Time-series Classification on Tiny Devices* Don Kurian

The Simplex Method Marco Chiarandini Department of Mathematics & Computer Science University