feature selection risk
play

Feature Selection Risk Alex Chinco University of Illinois at - PowerPoint PPT Presentation

Feature Selection Risk Alex Chinco University of Illinois at Urbana-Champaign September 15, 2014 Our model allows us to identify and interpret events faster than more traditional methods used by other investors. Quant. Fund Pitch Book


  1. Feature Selection Risk Alex Chinco University of Illinois at Urbana-Champaign September 15, 2014

  2. ” Our model allows us to identify and interpret events faster than more traditional methods used by other investors. —Quant. Fund Pitch Book

  3. ” Our model allows us to identify and interpret events faster than more traditional methods used by other investors. —Quant. Fund Pitch Book

  4. Imagine you’re a trader. Each stock can have Y / N exposure to 7 features. Whether or not. . . 1. It’s involved in a crowded trade 2. It’s mentioned in M&A rumors 3. Its major supplier closed down 4. Its labor force unionized 5. It belongs alcohol/tobacco/gaming industry 6. It’s referenced in a scientific article 7. It’s been added to the S&P 500 1 of the 7 features might have realized a shock. Having mystery feature raises demand by α > 0 shares. Question: How many observations do you need to see in order to decide which (if any) of the 7 features has realized a shock?

  5. Answer: Only 3 !

  6. Answer: Only 3 ! ◮ Stock 1 : crowded trade, supplier close, ATG ind., S&P 500 add. ◮ Stock 2 : M&A rumor, supplier close, sci. article, S&P 500 add. ◮ Stock 3 : labor unionization, ATG ind., sci. article, S&P 500 add. Data matrix ( X ) 3 × 7 tells you if stock n has attribute q : � 1 if yes iid ∼ N(0 , σ 2 x n,q = with ǫ n ǫ ) , α ≫ σ ǫ 0 if no e.g., if only d 1 ≈ α then crowded trade shock:   α       α ǫ 1 1 0 1 0 1 0 1   0         0 0 1 1 0 0 1 1 + ǫ 2  .  ≈       .   . 0 0 0 0 1 1 1 1   ǫ 3 0 ���� � �� � � �� � ( d ) 3 × 1 ( X ) 3 × 7 ( ǫ ) 3 × 1 ���� ( α ) 7 × 1 e.g., if d 1 ≈ d 2 ≈ d 3 ≈ α , then S&P 500 addition shock.

  7. Key Insight: Inference problem changes character at N ⋆ = 3 .

  8. Key Insight: Inference problem changes character at N ⋆ = 3 . First, imagine you’ve seen N = 4 observations:         α α 1 0 1 0 1 0 1 ǫ 1   0       0 0 1 1 0 0 1 1 ǫ 2           + ≈ .       . 0 0 0 0 1 1 1 1   ǫ 3     .     1 1 0 0 1 1 0 α ǫ 4 0 � �� � � �� � � �� � ���� ( d ) 4 × 1 ( X ) 4 × 7 ( ǫ ) 4 × 1 ( α ) 7 × 1 √ Estimate of α is now ( d 1 + d 4 ) / 2 ≈ α ± σ ǫ / 2 .

  9. Key Insight: Inference problem changes character at N ⋆ = 3 . First, imagine you’ve seen N = 4 observations:         α α 1 0 1 0 1 0 1 ǫ 1   0       0 0 1 1 0 0 1 1 ǫ 2           + ≈ .       . 0 0 0 0 1 1 1 1   ǫ 3     .     1 1 0 0 1 1 0 α ǫ 4 0 � �� � � �� � � �� � ���� ( d ) 4 × 1 ( X ) 4 × 7 ( ǫ ) 4 × 1 ( α ) 7 × 1 √ Estimate of α is now ( d 1 + d 4 ) / 2 ≈ α ± σ ǫ / 2 . Now, imagine you’ve instead seen only N = 2 observations:   α � � � � � �   0 α ǫ 1 1 0 1 0 1 0 1     + ≈ . . ǫ 2 0 0 1 1 0 0 1 1   .   ���� � �� � ���� 0 ( d ) 2 × 1 ( X ) 2 × 7 ( ǫ ) 2 × 1 ���� ( α ) 7 × 1 Could be either crd. trade or ATG ind. How to value 3 rd asset? � 0 1 � x 3 = 0 0 1 1 1

  10. This is a stylized example, but. . . the problem scales! iid Suppose Q = 400 , K = 5 , and x n,q ∼ N(0 , 1) : 400 � d n = ˜ d n − E[ ˜ d n | f ] = α q · x n,q + ǫ n q =1 Bonferroni Threshold FDR Threshold LASSO 1.00 α q } ) 2 q =1 1 { α q � =ˆ 0.75 0.50 25 · ( � 400 N ⋆ ≈ 22 N ⋆ ≈ 22 N ⋆ ≈ 22 0.25 1 / 0.00 3 4 5 6 3 4 5 6 3 4 5 6 log( N )

  11. 1) Derive feature selection bound 2) Embed in eqm. asset-pricing model 3) Outline empirical predictions: ◮ Noise trader and feature selection risks are substitutes. ◮ Derivatives more informative than Arrow securities. Slogan: There are fundamental limits on how quickly even the most sophisticated trader can interpret market signals. Sparse B.R.: Gabaix (2012); Compressed Sensing: Candes, Romberg, and Tao (2004); Candes and Tao (2005); Donoho (2006); Cogn. Control: Chinco (2014); High-D. Inference: Chinco and Clark-Joseph (2014); Info-Based Asset Pricing: Grossman and Stiglitz (1980); Kyle (1985); Veldkamp (2006); Behavioral Finance: Barberis, Shleifer, and Wurgler (2005); Garleanu and Pedersen (2012).

  12. Consider sequences of Kyle (1985)-type markets where: N →∞ Q N , K N = ∞ lim N ≥ K N lim K N / Q N = 0 N →∞ Agents must use feature selection rule, φ ( d , X ) , to identify shocks: φ : R N × R N × Q �→ R Q where FSE[ φ ] is prob. that φ identifies wrong features. Proposition (Feature Selection Bound) If there exists some constant C > 0 such that: N < C × K N · log( Q N / K N ) as N → ∞ , then there exists some constant c > 0 such that: min φ ∈ Φ FSE[ φ ] > c N ⋆ ( Q, K ) ≍ K · log( Q / K ) is the feature selection bound.

  13. Static Kyle (1985)-type model with N assets. N informed traders each get priv. signal about value of single asset. Single market maker (MM) views agg. demand for N assets: � � θ ) · d � 2 α = arg min � � X � α − ( 1 / 2 + γ · � � α � 1 α ∈ R Q � ◮ Informed trader demand rule: y n = θ · v n ◮ Market maker pricing rule: p n = λ · d n Proposition (Equilibrium Using the LASSO) If MM uses the LASSO and N > N ⋆ , then there exists an equilibrium: � � σ z � � 1 K λ = and θ = C · log( Q ) × N · 2 · θ σ v � for C > 0 and γ = 2 · ( σ z / θ ) · 2 · log( Q ) .

  14. Informed trader expected profit: � C / 2 · K / N · log( Q ) × σ v · σ z Question: What is the feature count for noise trader demand volatility exchange rate that leaves informed traders indifferent?

  15. Informed trader expected profit: � C / 2 · K / N · log( Q ) × σ v · σ z Question: What is the feature count for noise trader demand volatility exchange rate that leaves informed traders indifferent? Consider transformations: Q �→ Q ′ = Q · (1 + ∆ Q ) σ z �→ σ ′ and z = σ z · (1 + ∆ σ z ) Proposition (Substituting Risks) If σ z decreases by ∆ σ z < 0 , then informed trader expected profits are unchanged if Q increases by ∆ Q > 0 : � Q � ∆ Q = 2 · log( Q ) · × − ∆ σ z σ z

  16. Question: What kind of asset reveals shocks using fewest obs.?

  17. Question: What kind of asset reveals shocks using fewest obs.? Could look at Arrow securities:     d ( A )   1 0 0 0 α 1 · · · 1  d ( A )     0 1 0 0  α 2 · · ·     2       d ( A )   0 0 1 0 α 3 = · · · + “Noise”       3      . . . .  . . ... . . . . .  .      . . . . . .   d ( A ) α Q 0 0 0 1 · · · Q � �� � X ( A ) . . . but this is over-kill!

  18. Question: What kind of asset reveals shocks using fewest obs.? Could look at Arrow securities:     d ( A )   1 0 0 0 α 1 · · · 1  d ( A )     0 1 0 0  α 2 · · ·     2       d ( A )   0 0 1 0 α 3 = · · · + “Noise”       3      . . . .  . . ... . . . . .  .      . . . . . .   d ( A ) α Q 0 0 0 1 · · · Q � �� � X ( A ) . . . but this is over-kill! Could also look at N deriv. constr. by fin. eng. from Q Arrow sec.: N × Q D X ( A ) N × Q = X Q × Q Can’t have ind. exposures to all Q features since N ≪ Q . e.g., all deriv. must have sim. exp. to, say, crwd. trade and S&P 500 incl.

  19. Key insight: Don’t need complete independence! If any (2 · K ) columns of X are lin. indep., then any K -sparse signal α ∈ R Q can be reconstructed uniquely from X α . Why? Suppose not. i.e., there exists α , α ′ ∈ R Q with X α = X α ′ ; but, this implies X ( α − α ′ ) = 0 which is a contrdtn. α − α ′ is at most (2 · K ) -sparse. There can’t be lin. dep. betw. (2 · K ) cols. of X by asm. Proposition (Seemingly Redundant Assets) If N ≥ N ⋆ ( Q, K ) , then MM studying deriv. using the LASSO can identify K -sparse shocks with prob. greater than 1 − C 1 · e − C 2 · K using: Θ[ K / Q · log( Q / K )] times fewer assets than MM studying Arrow sec with C 1 , C 2 > 0 .

  20. Thanks!

Recommend


More recommend