Statistics Asymptotic Theory Shiu-Sheng Chen Department of Economics National Taiwan University Fall 2019 Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 1 / 28
Asymptotic Theory: Motivation Asymptotic theory (or large sample theory) aims at answering the question: what happens as we gather more and more data? In particular, given random sample, { X 1 , X 2 , X 3 , . . . , X n } , and statistic: T n = t ( X 1 , X 2 , . . . , X n ) , what is the limiting behavior of T n as n � → ∞ ? Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 2 / 28
Asymptotic Theory: Motivation Why asking such a question? For instance, given random sample { X i } n i = 1 ∼ i . i . d . N ( µ , σ 2 ) , we know that X n ∼ N ( µ , σ 2 n ) ¯ i = 1 ∼ i . i . d . ( µ , σ 2 ) without normal assumption, However, if { X i } n what is the distribution of ¯ X n ? We don’t know, indeed. Is it possible to find a good approximation of the distribution of X n as n � → ∞ ? ¯ Yes! This is where the asymptotic theory kicks in. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 3 / 28
Preliminary Knowledge Section 1 Preliminary Knowledge Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 4 / 28
Preliminary Knowledge Preliminary Knowledge Limit Markov Inequality Chebyshev Inequality Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 5 / 28
Preliminary Knowledge Limit of a Real Sequence Definition (Limit) If for every ε > 0 , and an integer N ( ε ) , ∣ b n − b ∣ < ε , ∀ n > N ( ε ) then we say that a sequence of real numbers { b 1 , . . . , b n } converges to a limit b . It is denoted by n → ∞ b n = b lim Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 6 / 28
Preliminary Knowledge Markov Inequality Theorem (Markov Inequality) Suppose that X is a random variable such that P ( X ≥ 0 ) = 1 . Then for every real number m > 0 , P ( X ≥ m ) ≤ E ( X ) m Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 7 / 28
Preliminary Knowledge Chebyshev Inequality Theorem (Chebyshev Inequality) Let Y ∼ ( E ( Y ) , Var ( Y )) . Then for every number ε > 0 , P (∣ Y − E ( Y )∣ ≥ ε ) ≤ Var ( Y ) ε 2 Proof: Let X = [ Y − E ( Y )] 2 , then P ( X ≥ 0 ) = 1 and E ( X ) = Var ( Y ) Hence, the result can be derived by applying the Markov Inequality. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 8 / 28
Modes of Convergence Section 2 Modes of Convergence Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 9 / 28
Modes of Convergence Types of Convergence For a random variable, we consider three modes of convergence: Converge in Probability Converge in Distribution Converge in Mean Square Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 10 / 28
Modes of Convergence Converge in Probability Definition (Converge in Probability) Let { Y n } be a sequence of random variables and let Y be another random variable. For any ε > 0 , P (∣ Y n − Y ∣ < ε ) � → 1, as n � → ∞ then we say that Y n converges in probability to Y , and denote it by p Y n → Y � Equivalently, P (∣ Y n − Y ∣ ≥ ε ) � → 0, as n � → ∞ Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 11 / 28
Modes of Convergence Converge in Probability { X i } n i = 1 ∼ i . i . d . Bernoulli ( 0.5 ) and then compute Y n = ¯ X n = ∑ i X i n p In this case, Y n � → 0.5 1.0 0.8 z 0.6 0.4 0.2 0 200 400 600 800 1000 toss Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 12 / 28
Modes of Convergence Converge in Distribution Definition (Converge in Distribution) Let { Y n } be a sequence of random variables with distribution function F Y n ( y ) , (denoted by F n ( y ) for simplicity). Let Y be another random variable with distribution function, F Y ( y ) . If n → ∞ F n ( y ) = F Y ( y ) at all y for which F Y ( y ) is continuous lim then we say that Y n converges in distribution to Y . It is denoted by d Y n � → Y F Y ( y ) is called the limiting distribution of Y n . Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 13 / 28
Modes of Convergence Converge in Mean Square Definition (Converge in Mean Square) Let { Y n } be a sequence of random variables and let Y be another random variable. If E ( Y n − Y ) 2 � → 0, as n � → ∞ . Then we say that Y n converges in mean square to Y . It is denoted by ms Y n → Y � It is also called converge in quadratic mean. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 14 / 28
Important Theorems Section 3 Important Theorems Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 15 / 28
Important Theorems Theorems Theorem � → c if and only if ms Y n n → ∞ E ( Y n ) = c , and lim n → ∞ Var ( Y n ) = 0. lim Proof. It can be shown that E ( Y n − c ) 2 = E ([ Y n − E ( Y n )] 2 ) + [ E ( Y n ) − c ] 2 Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 16 / 28
Important Theorems Theorems Theorem � → Y then Y n � → Y p ms If Y n Proof: Note that P (∣ Y n − Y ∣ 2 ≥ 0 ) = 1 , and by Markov Inequality, P (∣ Y n − Y ∣ ≥ k ) = P (∣ Y n − Y ∣ 2 ≥ k 2 ) ≤ E (∣ Y n − Y ∣ 2 ) k 2 Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 17 / 28
Important Theorems Weak Law of Large Numbers, WLLN Theorem (WLLN) i = 1 with σ 2 = Var ( X 1 ) < ∞ . Let ¯ Given a random sample { X i } n X n denote the sample mean, and note that E ( ¯ X n ) = E ( X 1 ) = µ . Then � → µ p ¯ X n Proof: (1) By Chebyshev Inequality (2) By Converge in Mean Square Sample mean ¯ X n is getting closer (in probability sense) to the population mean µ as the sample size increases. That is, if we use ¯ X n as a guess of unknown µ , we are quite happy that the sample mean makes a good guess. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 18 / 28
Important Theorems WLLN for Other Moments Note that the WLLN can be thought as = X 1 + X 2 + ⋯ X n ∑ n i = 1 X i � → E ( X 1 ) p n n Let Y = X 2 , and by the WLLN, = Y 1 + Y 2 + ⋯ Y n ∑ n i = 1 Y i � → E ( Y 1 ) p n n Hence, ∑ n 1 + X 2 2 + ⋯ X 2 i = 1 X 2 = X 2 � → E ( X 2 1 ) p i n n n Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 19 / 28
Important Theorems Example: An Application of WLLN Assume W n ∼ Binomial ( n , µ ) , and let Y n = W n n . Then � → µ p Y n Why? Since W n = ∑ i X i , X i ∼ i . i . d . Bernoulli( µ ) with E ( X 1 ) = µ , Var ( X 1 ) = µ ( 1 − µ ) , the result follows by WLLN. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 20 / 28
Important Theorems Central Limit Theorem, CLT Theorem (CLT) Let { X i } n i = 1 be a random sample, where E ( X 1 ) = µ < ∞ , Var ( X 1 ) = σ 2 < ∞ , then √ n ( ¯ X n − E ( ¯ X n ) X n − µ ) ¯ Z n = √ = � → N ( 0, 1 ) d Var ( ¯ X n ) σ If a random sample is taken from any distribution with mean µ and variance σ 2 , regardless of whether this distribution is discrete or continuous, then the distribution of the random variable Z n will be approximately the standard normal distribution in large sample. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 21 / 28
Important Theorems CLT Using notation of asymptotic distribution, X n − µ ¯ √ ∼ A N ( 0, 1 ) , σ 2 n Or X n ∼ A N ( µ , σ 2 n ) , ¯ where ∼ A represents asymptotic distribution, and A represents Asymptotically Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 22 / 28
Important Theorems An Application of CLT Example: Assume { X i } ∼ i.i.d.Bernoulli( µ ), then X n − µ ¯ √ � → N ( 0, 1 ) . d µ ( 1 − µ ) n Why? Since E ( ¯ X n ) = µ , and Var ( ¯ X n ) = σ 2 n = µ ( 1 − µ ) n Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 23 / 28
Important Theorems Continuous Mapping Theorem Theorem (CMT) � → Y , and g ( ⋅ ) is continuous, then p Given Y n g ( Y n ) � → g ( Y ) . p Proof: omitted here. � → Y , then p Examples: if Y n � → 1 p 1 Y n Y � → Y 2 p Y 2 √ √ Y n n � → p Y Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 24 / 28
Important Theorems Theorem Theorem � → W and Y n � → Y , then p p Given W n W n + Y n � → W + Y p � → WY p W n Y n Proof: omitted here. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 25 / 28
Important Theorems Slutsky Theorem Theorem � → W and Y n � → c , where c is a constant. Then p d Given W n W n + Y n � → W + c d � → cW d W n Y n � → W c for c ≠ 0 d W n Y n Proof: omitted here. Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 26 / 28
Important Theorems The Delta Method Theorem Given √ n ( Y n − θ ) � → N ( 0, σ 2 ) . Let g ( ⋅ ) be differentiable, and d g ′ ( θ ) ≠ 0 exists, then √ n ( g ( Y n ) − g ( θ )) � → N ( 0, [ g ′ ( θ )] 2 σ 2 ) . d Proof: (sketch) Given 1st-order Taylor approximation g ( Y n ) ≈ g ( θ ) + g ′ ( θ )( Y n − θ ) , then √ n ( g ( Y n ) − g ( θ )) ≈ √ n ( Y n − θ ) � → N ( 0, σ 2 ) d g ′ ( θ ) Shiu-Sheng Chen (NTU Econ) Statistics Fall 2019 27 / 28
Recommend
More recommend