The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications Christian Hansen Yuan Liao May 2017 Montreal Hansen and Liao Factor-Lasso
Introduction ◮ Observe many control variables ◮ Two popular (formal) dimension blueuction techniques: Variable/model selection - e.g. lasso Factor models Hansen and Liao Factor-Lasso
Variable Selection Review ( α parameter of interest): y i = α d i + x ′ i β + ε i d i = x ′ i γ + u i 1. Allow MANY control variables 2. Impose SPARSITY on β, γ ◮ Literature: Belloni, Chernozhukov and Hansen (12 REStud. ), etc. ◮ weak dependence among x ◮ just a few x have impact on y , d Hansen and Liao Factor-Lasso
Large Factor Model Review ( α parameter of interest): y i = α d i + f ′ i β + ε i d i = f ′ i γ + v i x i = Λ f i + U i 1. Most of x have impact on y , d . 2. dimension of f i is small ◮ Literature: Factor augmented regressions, diffusion index forecast (e.g. Bai and Ng (03), Stock and Watson (02)) ◮ Generally results in strong dependence among x ◮ Regression directly on x will generally NOT produce sparse coefficients ◮ Do not worry about the “remaining information” in U i Hansen and Liao Factor-Lasso
What we aim to do nests large factor models and variable selection. i θ y + ε i y i = α d i + f ′ i β + U ′ i θ d + v i d i = f ′ i γ + U ′ x i = Λ f i + U i 1. U i represent variation in observables not captured by factors 2. estimation method: lasso on U i . 3. Justifications of key assumptions for lasso: ◮ Weak dependence among regressors: Most variations in x are driven by factors. ◮ Sparsity of θ : only a few x have “useful remaining information” after factors are controlled. Hansen and Liao Factor-Lasso
Some “why not” questions we had... 1. control for ( f i , x i ) instead of ( f i , U i ) : i θ y + ε i y i = α d i + f ′ i β + x ′ i θ d + v i d i = f ′ i γ + x ′ x i = Λ f i + U i ◮ within x i : strongly correlated. ◮ between x i and f i : strongly correlated. 2. Use lots of factors y i = α d i + f ′ i β + ε i d i = f ′ i γ + v i x i = Λ f i + U i ◮ Allow dim ( f i ) to increase fast with p = dim ( x i ) ◮ Assume ( β, γ ) sparse, then “lasso” them. ◮ No sufficient amount “cross-sectional” information for factors ◮ Estimating factors is either inconsistent or with slow rate, impacting inference on α Hansen and Liao Factor-Lasso
Some “why not” questions we had... 3. Sparse PCA x i , l = λ ′ l f i + U i , l = 1 , ..., p , i = 1 , ..., n ◮ Most of ( λ 1 , ..., λ p ) are zero. ◮ Most of x do not depend on factors. Become a sparse model: y i = α d i + x ′ i β + ε i d i = x ′ i γ + u i Hansen and Liao Factor-Lasso
What we do i θ y + ε i y i = α d i + f ′ i β + U ′ i θ d + v i d i = f ′ i γ + U ′ x i = Λ f i + U i , i = 1 , ..., n ◮ Do not directly observe ( f , U ) ; ( θ y , θ d ) are sparse ◮ dim ( f i ) , dim ( α ) are small. 1. Estimate ( f , U ) from the third equation 2. Lasso on i θ new + ε new y i − � E ( y i | f i ) = � U ′ ε new , = α v i + ε i i i i θ d + v i d i − � E ( d i | f i ) = � U ′ 3. OLS on � = α � ε new v i + ε i i Hansen and Liao Factor-Lasso
Extensions: I, II I: endogenous treatment i θ y + ε i y i = α d i + f ′ i β + U ′ i θ d + v i d i = π z i + f ′ i γ + U ′ i θ z + u i z i = f ′ i ψ + U ′ x i = Λ f i + U i , i = 1 , ..., n II: diffusion index forecast y t + h = α y t + f ′ t β + U ′ t θ + ε t + h x t = Λ f t + U t , t = 1 , ..., T . Include U t to capture idiosyncratic information in x t . Hansen and Liao Factor-Lasso
Extensions: III Panel data What we focused on in this paper: it θ y + µ y y it = α d it + ( λ y t ) ′ f i + U ′ i + δ y t + ǫ it it θ d + µ d d it = ( λ d t ) ′ f i + U ′ i + δ d t + η it X it = Λ t f i + µ X i + δ X t + U it , i ≤ n , t ≤ T , dim ( X it ) = p ◮ µ i and δ t are unrestricted individual and time effects ◮ p → ∞ , n → ∞ , ◮ T is either fixed or growing but satisfy T = o ( n ) , because: need accurate estimation of U it , relying on estimating Λ t ◮ n = o ( p 2 ) because need accurate estimation of f i . Hansen and Liao Factor-Lasso
Asymptotic Normality Define �� � 2 � � 1 σ ηǫ = 1 σ ηǫ = Var √ ( η it − ¯ η i )( ǫ it − ¯ ǫ i ) � � η it � ǫ it nT nT i , t t i � � 1 η = 1 σ 2 η i ) 2 σ 2 η 2 η = E ( η it − ¯ � � it nT nT i , t i , t √ η σ − 1 / 2 d σ 2 nT ( � α − α ) − → N ( 0 , 1 ) ηǫ √ d σ 2 σ − 1 / 2 � η � nT ( � α − α ) − → N ( 0 , 1 ) ηǫ Additional comments: ◮ Not clear that you could get these results even if λ y t = 0 were known due to strong dependence in X resulting from presence of factors ◮ First taking care of factor structure in X seems potentially important Hansen and Liao Factor-Lasso
Extensions of Inference I: K-Step Bootstrap Alternative to inference from plug-in asymptotic distribution is bootstrap inference Full bootstrap lasso: ◮ Generate bootstrap data ( X i , ∗ , Y ∗ i ) ◮ � n β ∗ = arg min 1 β ) 2 + λ � β � 1 � ( Y ∗ i − X ∗ T i n i = 1 ◮ Repeat B times. Full bootstrap lasso is potentially burdensome. Hansen and Liao Factor-Lasso
K-Step Bootstrap Consider a K-Step bootstrap in Andrews (2002): ◮ Start lasso at full sample solution ( � β lasso ) ◮ For each bootstrap data, initialize at � 0 = � β ∗ β lasso ◮ Employ iterative algorithms: Obtain β lasso = � � β ∗ 0 ⇒ � β ∗ 1 ⇒ ... ⇒ � β ∗ k ◮ Similar to Andrews 02, each step is in closed form - fast even in large problems ◮ Different from Andrews 02, each step is still an l 1 -penalized problem Hansen and Liao Factor-Lasso
Coordinate descent (Fu 1998) ◮ Update one component at a time, fixing the remaining components: � 1 − X ij β j ) 2 + λ | ψ j β j | = min i − X ∗ ′ ( Y ∗ i , − j � β ∗ L ℓ ( β j ) + λ | ψ j β j | min ℓ, − j n � �� � β j β j i others, known � β ∗ ℓ + 1 , j = arg min L ℓ ( β j ) + λ | ψ j β j | β j for j = 1 , ..., p . ◮ Each � β ∗ ℓ + 1 , j is closed form = soft-thresholding. 1 2 ( z − β ) 2 + λ | β | arg min β ∈ R = sgn ( z ) max ( | z | − λ, 0 ) Hansen and Liao Factor-Lasso
Faster methods ◮ “Composite Gradient descent” (Nesterov 07, Agarwal et al. 12 Ann. Statist. ) update the entire vector at once β ∗ � β ( β − � β ∗ l ) ′ V ( β − � β ∗ l ) + b ′ ( β − � β ∗ l + 1 = arg min l ) + λ � ψβ � 1 originally: Replace V by h 2 × identity ⇒ the entire vector is in closed form= soft thresholding ◮ choose h : if dimension is small, use h = 2 λ max ( V ) to “majorize” V If dimension is large, 2 λ max ( V ) is unbounded (Johnstone 01) Hansen and Liao Factor-Lasso
General Conditions for Iterative Algorithms Q ( β ) = 1 n � Y ∗ − X ∗ β � 2 2 + λ � Ψ β � 1 Suppose � β ∗ k satisfies: 1. minimization error is smaller than statistical error. Q ( � β Q ( β ) + o P ∗ ( | � β k ) ≤ min β − β 0 | ) 2. sparsity: | � β k | 0 = O P ∗ ( | J | 0 ) . Can be directly verified using the KKT condition We verified both conditions for the Coordinate descent ( Fu 98) Hansen and Liao Factor-Lasso
Bootstrap Confidence Interval √ τ/ 2 be the τ/ 2 th upper quantile of { α b − � Let q ∗ nT | � α | : b = 1 , ..., B } k-step bootstrap does not affect first-order asymptotics. (proved for linear model) � � √ ◮ P α ± q ∗ α ∈ � τ/ 2 / nT → 1 − τ. ◮ extendable to nonlinear models with orthogonality conditions Hansen and Liao Factor-Lasso
Technical remarks ◮ We spent most of the time proving: The effect of estimating ( f , U ) is first-order negligible under weakest possible conditions on ( n , T , p ) ◮ Require weighted errors of the form: � � d ≤ p | 1 d ≤ p | 1 ( � ( � max f i − f i ) w id | , max f i − f i ) z it , d | n nT i it � i � � f i − f i � 2 ◮ Easy to bound using Cauchy-Schwarz and 1 n But very crude, leading to stronger than necessary conditions ◮ Need to use the expansion of � f i − f i ( � f i = PCA estimator) ◮ If � f i has no closed form (e.g., MLE), need its Bahadur expansion Hansen and Liao Factor-Lasso
Extensions of Inference: II, III II: factor augmented regression: t θ y + ε t y t = α d t + f ′ t β + U ′ t γ + U t θ d + v t d t = f ′ x t = Λ f t + U t , t = 1 , ..., T ◮ α ⊥ E ( y t | f t , U t ) , E ( d t | f t , U t ) , Lasso does NOT affect first-order asymptotics (Robinson 88, Andrews 94, Chernozhukov et al 16) ◮ Apply HAC (Newey-West) III: Out-of- sample forecast interval y t + h = α y t + f ′ t β + U ′ t θ + ε t + h � �� � y t + h | t x t = Λ f t + U t , t = 1 , ..., T . y T + h | T �⊥ U ′ t θ , Lasso estimation of U ′ t θ DOES affect confidence interval for y T + h | T Hansen and Liao Factor-Lasso
Recommend
More recommend