st519 807 mathematical statistics
play

ST519/807: Mathematical Statistics Bent Jrgensen University of - PDF document

ST519/807: Mathematical Statistics Bent Jrgensen University of Southern Denmark November 22, 2014 Abstract These are additional notes on the course material for ST519/807: Mathematical Statis- tics. HMC refers to the textbook Hogg et al.


  1. ST519/807: Mathematical Statistics Bent Jørgensen University of Southern Denmark November 22, 2014 Abstract These are additional notes on the course material for ST519/807: Mathematical Statis- tics. HMC refers to the textbook Hogg et al. (2013). Key words: Asymptotic theory, consistency, Cramér-Rao inequality, e¢ciency, exponential family, estimation, Fisher’s scoring method, Fisher information, identi…ability, likelihood, maximum likelihood, observed information, orthogonality, parameter, score function, statis- tical model, statistical test, su¢ciency. Fisher (1922), under the heading "The Neglect of Theoretical Statistics", wrote: Several reasons have contributed to the prolonged neglect into which the study of statistics, in its theoretical aspects, has fallen. In spite of the immense amount of fruitful labour which has been expended in its practical application, the basic principles of this organ of science are still in a state of obscurity, and it cannot be denied that, during the recent rapid development of practical methods, fundamental problems have been ignored and fundamental paradoxes left unresolved. Fisher then went on to introduce the main ingredients of likelihood theory, which shaped much of mathematical statistics of the 20th Century, including concepts such as statistical model, parameter, identi…ability, estimation, consistency, likelihood, score func- tion, maximum likelihood, Fisher information, e¢ciency, and su¢ciency. Here we review the basic elements of likelihood theory in a contemporary setting. Prerequisites: Sample space; probability distribution; discrete and continuous random vari- ables; PMF and PDF; transformations; independent random variables; mean, variance, co- variance and correlation. Special distributions: Uniform; Bernoulli; binomial; Poisson; geometric; negative binomial; gamma; chi-square; beta; normal; t -distribution; F -distribution. Contents 1 Stochastic convergence and the Central Limit Theorem 3 2 The log likelihood function and its derivatives 8 2.1 Likelihood and log likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 The score function and the Fisher information function . . . . . . . . . . . . . . . 11 2.3 Observed information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 The Cramér-Rao inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1

  2. 3 Asymptotic likelihood theory 18 3.1 Asymptotic normality of the score function . . . . . . . . . . . . . . . . . . . . . 18 3.2 The maximum likelihood estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Exponential families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Consistency of the maximum likelihood estimator . . . . . . . . . . . . . . . . . . 25 3.5 E¢ciency and asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 The Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7 Location models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Vector parameters 30 4.1 The score vector and the Fisher information matrix . . . . . . . . . . . . . . . . . 30 4.2 Cramér-Rao inequality (generalized) . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Consistency and asymptotic normality of the maximum likelihood estimator . . . 32 4.4 Parameter orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 Exponential dispersion models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.6 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Su¢ciency 39 5.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 The Fisher-Neyman factorization criterion . . . . . . . . . . . . . . . . . . . . . . 41 5.3 The Rao–Blackwell theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.4 The Lehmann-Sche¤é theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 The likelihood ratio test and other large-sample tests 48 6.1 Standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 The likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.3 Wald and score tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7 Maximum likelihood computation 51 7.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Stabilized Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.3 The Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.4 Fisher’s scoring method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.5 Step length calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.6 Convergence and starting values . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2

  3. 1 Stochastic convergence and the Central Limit Theorem � Setup: Let X denote a random variable (r.v.) and let f X n g 1 n =1 denote a sequence of r.v.s., all de…ned on a suitable probability space ( C ; B ; P ) (sample space, � -algebra, probability measure). � De…nition: Convergence in probability. We say that P X n ! X as n ! 1 ( X n converges to X in probability) if n !1 P ( j X n � X j � " ) = 0 8 " > 0 lim � De…nition: Convergence in distribution. If F is a distribution function (CDF) we say that D X n ! F as n ! 1 ( X n converges to F in distribution) if P ( X n � x ) ! F ( x ) as n ! 1 for all x 2 C ( F ) where C ( F ) denotes the set of continuity points of F . If X has distribution function F , we also write D X n ! X as n ! 1 � Properties: As n ! 1 P P 1. X n ! X ) aX n ! aX ! X ) g ( X n ) P P 2. X n ! g ( X ) if g is continuous P D 3. X n ! X ) X n ! X P P 4. If X n ! X and Y n ! Y then P P X n + Y n ! X + Y and X n Y n ! XY (1.1) � Example: Let X be symmetric, i.e. � X � X , and de…ne X n = ( � 1) n X Then D X n ! X (meaning that X n converges to the distribution of X ), since F X n = F X for all n , but unless X n is constant, X n 9 X in probability 3

  4. � However, we have the following properties D P 1. X n ! c ) X n ! c D P D 2. X n ! X and Y n ! 0 then X n + Y n ! X ! X ) g ( X n ) D D 3. X n ! g ( X ) if g is continuous D P P 4. Slutsky’s Theorem: If X n ! X and A n ! a , B n ! b then P A n + B n X n ! a + bX D D � Example: Let X n and Y n be two sequences such that X n ! X and Y n ! X The following examples show that we do not in general have a result similar to (1.1) for convergence in distribution. 1. Suppose that X is symmetric (see above), and let X n = X and Y n = � X for all n . Then X n + Y n = X � X = 0 so clearly X n + Y n converges i distribution to 0 as n ! 1 . 2. Now suppose that for each n , X n and Y n are independent and identically distributed with CDF F ( x ) = P ( X < x ) for all x . Now D X n + Y n ! F X 1 + Y 1 where F X 1 + Y 1 ( x ) = P ( X 1 + Y 1 � x ) for all x , corresponding to the convolution of X 1 D D and Y 1 : Hence, the assumption that X n ! X and Y n ! X is not enough to determine the limiting distribution of X n + Y n , which in fact depends on the sequence of joint distribution of X n and Y n . � Statistical setup : Let X 1 ; X 2 ; : : : be a sequence of i.i.d. variables. Assume � 2 = Var( X i ) � = E( X i ) and De…ne for n = 1 ; 2 ; : : : X n X n = 1 X i and � T n = nT n i =1 Then X n ) = � 2 E( � Var( � X n ) = � n 4

  5. � The (Weak) Law of Large Numbers (LLN) says P � X n ! � Proof: Use Chebyshev’s inequality �� � � � � 2 =n � X n � � � � " P ! 0 as n ! 1 " 2 � Convergence to the standard normal distribution P ( X n � x ) ! �( x ) as n ! 1 ; for all x 2 R , where Z x e � 1 2 t 2 dt: �( x ) = (2 � ) � 1 = 2 �1 � Now we de…ne Z n = p n ( � X n � � ) for which Var( Z n ) = � 2 E( Z n ) = 0 � The Central Limit Theorem (CLT) (see James, p. 265 or HMC p. 307) says D ! N(0 ; � 2 ) Z n as n ! 1 Practical use p n ( � X n � � ) � N(0 ; � 2 ) approx. which implies � � �; � 2 � X n � N approx. n Rule: The approximate normal distribution shares with � X n its mean and variance. Example Bernoulli trials. Assume that the X i are i.i.d. Bernoulli variables, P ( X i = 1) = � = 1 � P ( X i = 0) Hence we use � as probability parameter, which is also the mean of X i , � 2 = Var( X i ) = � (1 � � ) � = E( X i ) and Then n X T n = X i = # of 1 s in a sample of n i =1 In fact T n � Bi( n; � ) (binomial distribution). Then, by the LLN P � X n ! � 5

Recommend


More recommend