Gaussian approximations and multiplier bootstrap for maxima of sums - PowerPoint PPT Presentation

Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors Victor Chernozhukov (MIT), Denis Chetverikov (UCLA), and Kengo Kato (U. of Tokyo) Sep. 3. 2013 Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 1 / 24

This talk is based upon the paper: Chernozhukov, V., Chetverikov, D. and K. (2012). Central limit theorems and multiplier bootstrap when p is much larger than n . arXiv:1212.6906. [A revised version is to appear in Ann. Statist.] The title was changed during the revision process. Applications to moment inequality models (if time allowed) are based on an ongoing paper. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 2 / 24

Introduction Let x 1 , . . . , x n be independent random vectors in R p , p ≥ 2 . E[ x i ] = 0 and E[ x i x ′ i ] exists. E[ x i x ′ i ] may be degenerate. (Important!) Possibly p ≫ n . Keep in mind p = p n . This paper is about approximating the distribution of n 1 � T 0 = max √ n x ij . 1 ≤ j ≤ p i =1 By making x i,p +1 = − x i 1 , . . . , x i, 2 p = − x ip , we have � n � n 1 1 � � � � max √ n x ij � = max √ n x ij . � � � � 1 ≤ j ≤ p 1 ≤ j ≤ 2 p � i =1 i =1 Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 3 / 24

Introduction Let y 1 , . . . , y n be independent normal random vectors with y i ∼ N (0 , E[ x i x ′ i ]) . Define n 1 � Z 0 = max √ n y ij . 1 ≤ j ≤ p i =1 When p is fixed , (subject to the Lindeberg condition) the central limit theorem guarantees that sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | → 0 . t ∈ R Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 4 / 24

Introduction Basic question: How large p = p n can be while having sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | → 0? t ∈ R Related to multivariate CLT with growing dimension (Portnoy, 1986, PTRF; G¨ otze, 1991, AoP; Bentkus, 2003, JSPI, etc.). Write n n 1 1 � � X = √ n x i , Y = √ n y i . i =1 i =1 They are concerned with conditions under which sup | P( X ∈ A ) − P( Y ∈ A ) | → 0 , A ∈A while allowing for p = p n → ∞ . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 5 / 24

Introduction Bentkus (2003) proved that (in case of i.i.d. and E[ x i x ′ i ] = I ), | P( X ∈ A ) − P( Y ∈ A ) | = O ( p 1 / 4 E[ | x 1 | 3 ] n − 1 / 2 ) . sup A : convex Typically E[ | x 1 | 3 ] = O ( p 3 / 2 ) , so that the RHS= o (1) provided that p = o ( n 2 / 7 ) . The main message of the paper: to make sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | → 0 , t ∈ R p can be much larger . Subject to some conditions, log p = o ( n 1 / 7 ) will suffice. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 6 / 24

Introduction Still the above approximation results are not directly usable unless the cov. structure between the coordinates in X is unknown. In some cases, we know the cov. structure. e.g. think of x i = ε i z i where ε i is a scalar (error) r.v. with mean zero and common variance, and z i is the vector of non-stochastic covariates. Then T 0 is the maximum of t -statistics. But usually not. In such cases the dist. of Z 0 . is unknown. ⇒ We propose a Gaussian multiplier bootstrap for approximating the dist. of T 0 when the cov. structure between the coordinates of X is unknown. Its validity is established through the Gaussian approximation results. Still p can be much larger than n . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 7 / 24

Applications Selecting design-adaptive tuning parameters for Lasso (Tibshirani, 1996, JRSSB) and Dantzig selector (Cand` es and Tao, 2007, AoS). Multiple hypotheses testing (too many references). Adaptive specification testing. These three applications are examined in the arXiv paper. Testing many moment inequalities. Will be treated if time allowed. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 8 / 24

Literature Classical CLTs with p = p n → ∞ : Portnoy (1986, PTRF), G¨ otze (1991, AoP), Bentkus (2003, JSPI), among many others. Modern approaches on multivariate CLTs: Chatterjee (2005, arXiv),Chatterjee and Meckes (2008, ALEA), Reinert and R¨ ollin (2009, AoP), R¨ ollin (2011,AIHP). Developing Stein’s methods for normal approximation. Harsha, Klivans, and Meka (2012, J.ACM). Bootstrap in high dim.: Mammen (1993, AoS), Arlot, Blanchard, and Roquain (2010a,b, AoS). Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 9 / 24

Main Thm. Theorem Suppose that there exists const. 0 < c 1 < C 1 s.t. c 1 ≤ n − 1 � n i =1 E[ x 2 ij ] ≤ C 1 , 1 ≤ ∀ j ≤ p . Then sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | t ∈ R � n − 1 / 8 ( M 3 / 4 ∨ M 1 / 2 ) log 7 / 8 ( pn/γ ) ≤ C inf 3 4 γ ∈ (0 , 1) � + n − 1 / 2 Q (1 − γ ) log 3 / 2 ( pn/γ ) + γ , where C = C ( c 1 , C 1 ) > 0 . Here Q (1 − γ ) = (1 − γ ) -quantile of max i,j | x ij | ∨ (1 − γ ) -quantile of max i,j | y ij | , and M k = max 1 ≤ j ≤ p ( n − 1 � n i =1 E[ | x ij | k ]) 1 /k . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 10 / 24

Comments No restriction on correlation structure. The extra parameter γ appears essentially to avoid the appearance of the term of the form 1 ≤ j ≤ p | x ij | k ] E[ max in the bound. Notice the difference from M k . To avoid this, we use a suitable truncation, and γ controls the level of truncation. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 11 / 24

Techniques There are a lot of techniques used to prove the main thm. Directly bounding the probability difference ( P( T 0 ≤ t ) − P( Z 0 ≤ t ) ) is difficult. Transform the problem into bounding E[ g ( X ) − g ( Y )] , g : smooth , where X = n − 1 / 2 � n i =1 x i , Y = n − 1 / 2 � n i =1 y i . How? Approximate z = ( z 1 , . . . , z p ) ′ �→ max 1 ≤ j ≤ p z j by F β ( z ) = β − 1 log( � p j =1 e βz j ) . Then 0 ≤ F β ( z ) − max 1 ≤ j ≤ p z j ≤ β − 1 log p . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 12 / 24

Techniques Approximate the indicator function 1( · ≤ t ) by a smooth function h (standard). Then take g = h ◦ F β . Use a variant of Stein’s method to bound E[ g ( X ) − g ( Y )] . (*) Truncation + some fine properties of F β are used here. To obtain a bound on the probability difference from (*), we need an anti-concentration ineq. for maxima of normal random vectors. Intuition: from (*), we will have a bound on P( T 0 ≤ t ) − P( Z 0 ≤ t + error ) . Want to replace P( Z 0 ≤ t + error ) by P( Z 0 ≤ t ) . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 13 / 24

Simplified anti-concentration ineq. Lemma (Simplified form) Let ( Y 1 , . . . , Y p ) ′ be a normal random vector with E[ Y j ] = 0 and E[ Y 2 j ] = 1 for all 1 ≤ j ≤ p . Then ∀ ǫ > 0 , sup P( | max 1 ≤ j ≤ p Y j − t | ≤ ǫ ) ≤ 4 ǫ (E[ max 1 ≤ j ≤ p Y j ] + 1) . t ∈ R This bound is universally tight (up to constant). Note 1: E[max 1 ≤ j ≤ p Y j ] ≤ √ 2 log p . Note 2: The inequality is dimension-free : Easy to extend it to separable Gaussian processes. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 14 / 24

Some consequences Assumption: either (E.1) E[exp( | x ij | /B n )] ≤ 2 , ∀ i, j ; or ij ]) 1 / 4 ≤ B n , ∀ i. 1 ≤ j ≤ p x 4 (E.2) (E[ max Moreover, assume both c 1 ≤ n − 1 � n i =1 E[ x 2 (M.1) ij ] ≤ C 1 , ∀ j ; and n − 1 � n i =1 E[ | x ij | 2+ k ] ≤ B k (M.2) n , k = 1 , 2 , ∀ j. Here B n → ∞ is allowed. e.g. consider the case where x i = ε i z i with ε i mean zero scalar error and z i vector of non-stochastic covariates normalized s.t. n − 1 � n i =1 z 2 ij = 1 , ∀ j . Then (E.2),(M.1),(M.2) are satisfied if E[ ε 2 i ] ≥ c 1 , E[ ε 4 i ] ≤ C 1 , | z ij | ≤ B n , ∀ i, j, after adjusting constants. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 15 / 24

Corollary Corollary Suppose that one of the following conditions is satisfied: (i) (E.1) and B 2 n log 7 ( pn ) ≤ C 1 n 1 − c 1 ; or n log 7 ( pn ) ≤ C 1 n 1 − c 1 . (ii) (E.2) and B 4 Moreover, suppose that (M.1) and (M.2) are satisfied. Then | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | ≤ Cn − c , sup t ∈ R where c, C depend only on c 1 , C 1 . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 16 / 24

Gaussian approximations and multiplier bootstrap for maxima of sums - PowerPoint PPT Presentation

Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors Victor Chernozhukov (MIT), Denis Chetverikov (UCLA), and Kengo Kato (U. of Tokyo) Sep. 3. 2013 Chernozhukov Chetverikov K. (MIT, UCLA, UT)

A better Bootstrap, Mack, and the ELRF and PTF modelling Frameworks Bootstrap technique- a

STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson Oberlin College 3 March 2017

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

1 Get Started 2 3 Web Application Development What is Bootstrap? Bootstrap is a free

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Bootstrap method for misspecified stochastic differential equation models Yuma Uehara The

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Verilog Modeling for Synthesis Multiplier Design (Nelson model) Add and shift binary

Axiomatic Foundations of Multiplier Preferences Tomasz Strzalecki Multiplier preferences

DIAMETER PHOTO-MULTIPLIER TUBES DEREK BOYLAN PHOTO-MULTIPLIER TUBES (PMTS) Photomultipler

URP Slides for Multiplier Tables 12 April Lectures Economic Impact of Maytag Closing Economic

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

Lecture 21: Bootstrap and Permutation Tests The bootstrap Bootstrapping generally refers to

HOCs: Higher-Order Components for Grids Jan Dnnweber (with Martin Alt, Jens Mller and Sergei

Webinar agenda We Speak Translate: What does a Google App have to do with Immigrant Settlement?

Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia

Recognition of Group Activities using Wearable Sensors 8 th International Conference on Mobile and

Matthew Series Lesson #028 March 30, 2014 Dean Bible Ministries www.deanbibleministries.org

Ultramafic rocks Definition: Color Index > 90, i.e., less than 10% felsic minerals. Not to be

Grid Applica+on Meta-Repository - Repository interconnec+vity

Using S Using Systems ystems Thinking hinking to to Addr Addres ess S s Structur tructural

Sambuz

Useful Links

Newsletter

Mail Us

Gaussian approximations and multiplier bootstrap for maxima of sums - PowerPoint PPT Presentation

Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors Victor Chernozhukov (MIT), Denis Chetverikov (UCLA), and Kengo Kato (U. of Tokyo) Sep. 3. 2013 Chernozhukov Chetverikov K. (MIT, UCLA, UT)

A better Bootstrap, Mack, and the ELRF and PTF modelling Frameworks Bootstrap technique- a

STAT 113 Bootstrap Confidence Intervals Colin Reimer Dawson Oberlin College 3 March 2017

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

1 Get Started 2 3 Web Application Development What is Bootstrap? Bootstrap is a free

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Bootstrap method for misspecified stochastic differential equation models Yuma Uehara The

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Verilog Modeling for Synthesis Multiplier Design (Nelson model) Add and shift binary

Axiomatic Foundations of Multiplier Preferences Tomasz Strzalecki Multiplier preferences

DIAMETER PHOTO-MULTIPLIER TUBES DEREK BOYLAN PHOTO-MULTIPLIER TUBES (PMTS) Photomultipler

URP Slides for Multiplier Tables 12 April Lectures Economic Impact of Maytag Closing Economic

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

Lecture 21: Bootstrap and Permutation Tests The bootstrap Bootstrapping generally refers to

HOCs: Higher-Order Components for Grids Jan Dnnweber (with Martin Alt, Jens Mller and Sergei

Webinar agenda We Speak Translate: What does a Google App have to do with Immigrant Settlement?

Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia

Recognition of Group Activities using Wearable Sensors 8 th International Conference on Mobile and

Matthew Series Lesson #028 March 30, 2014 Dean Bible Ministries www.deanbibleministries.org

Ultramafic rocks Definition: Color Index &gt; 90, i.e., less than 10% felsic minerals. Not to be

Grid Applica+on Meta-Repository - Repository interconnec+vity

Using S Using Systems ystems Thinking hinking to to Addr Addres ess S s Structur tructural

Sambuz

Useful Links

Newsletter

Mail Us

Ultramafic rocks Definition: Color Index > 90, i.e., less than 10% felsic minerals. Not to be