Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text Selection Bryan Kelly Yale University Asaf Manela Washington University in St. Louis Alan Moreira University of Rochester October 2018
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Motivation ◮ Digital text is increasingly available to social scientists ◮ Newspapers, blogs, regulatory fillings, congressional records ... ◮ Unlike data often used by economists ◮ Text is ultra high-dimensional ◮ Phrase counts are sparse ◮ Statistical learning from text requires ◮ Machine learning techniques ◮ Scalable algorithms
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion This paper ◮ Text is often selected by journalists, speechwriters, and others who cater to an audience with limited attention ◮ Hurdle Distributed Multiple Regression (HDMR) ◮ Highly scalable approach to inference from big counts data ◮ Includes an economically-motivated selection equation ◮ Especially useful when cover/no-cover choice is separate or more interesting than coverage quantity ◮ Applications using newspaper coverage for prediction 1. Backcast intermediary capital ratio (He-Kelly-Manela 2017 JFE) 2. Forecast macroeconomic series (Stock-Watson 2012 JBES)
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Related literature ◮ We extend machinery developed by Taddy (2012, 2015, 2016) to text selection ◮ Layer economically-motivated hurdle / selection equation on his Distributed Multinomial Regression (DMR) ◮ Find advantage of HDMR over DMR increases with sparsity ◮ Provide new tools to literatures in economics and finance ◮ Finance and media: Antweiler-Frank (2004), Tetlock (2007, 2011), Fang-Peress (2009), Engelberg-Parsons (2011), Dougal et al (2012), Peress (2014), Manela (2014), Fedyk (2018) ◮ Text-based uncertainty: Baker-Bloom-Davis (2016), Manela-Moreira (2017), Hassan et al (2017) ◮ Polarization: Gentzkow-Shapiro (2006), Gentzkow-Shapiro-Taddy ◮ Can better control and learn from high-dimensional content
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text data is inherently high-dimensional Document-term matrix c Documents digital text is available is selected text is 1: Digital text is available. ⇒ 2: Text is selected! · · · . 1: 1 1 1 0 . . 2: 0 1 0 1 . ... . .
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text data is inherently high-dimensional Document-term matrix c Documents digital text is available is selected text is 1: Digital text is available. ⇒ 2: Text is selected! · · · . 1: 1 1 1 0 . . 2: 0 1 0 1 . ... . .
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text regression is prone to overfit ◮ c i vector of counts in d categories for observation i ◮ e.g. c ij is date i newspaper mentions of phrase j (“world war”) ◮ v i vector of p covariates ◮ e.g. intermediary capital ratio, realized variance on date i ◮ Let v iy ∈ v i be a target variable ◮ e.g. intermediary capital ratio ◮ Because d ≫ n , we cannot run an OLS regression v iy = β 0 + [ c i , v i, − y ] ′ β + ε i
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text inverse regression ◮ A text inverse regression approach would instead 1. Regress word counts on covariates � � c i = λ α j + v ′ i ϕ j + υ i (backward regression) 2. Construct low dimensional projection into v iy direction � ˆ (sufficient reduction projection) z iy ≡ ϕ jy c ij j 3. Regress target variable on z iy and other covariates v iy = β 0 + [ z iy , v i, − y ] ′ β + ε i (forward regression) ◮ d + p − 1 dimensional regression reduced to p + 1 dimensional! ◮ z iy summarizes all textual information relevant for prediction
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text inverse regression ◮ A text inverse regression approach would instead 1. Regress word counts on covariates � � c i = λ α j + v ′ i ϕ j + υ i (backward regression) 2. Construct low dimensional projection into v iy direction � ˆ (sufficient reduction projection) z iy ≡ ϕ jy c ij j 3. Regress target variable on z iy and other covariates v iy = β 0 + [ z iy , v i, − y ] ′ β + ε i (forward regression) ◮ d + p − 1 dimensional regression reduced to p + 1 dimensional! ◮ z iy summarizes all textual information relevant for prediction
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text inverse regression ◮ A text inverse regression approach would instead 1. Regress word counts on covariates � � c i = λ α j + v ′ i ϕ j + υ i (backward regression) 2. Construct low dimensional projection into v iy direction � ˆ (sufficient reduction projection) z iy ≡ ϕ jy c ij j 3. Regress target variable on z iy and other covariates v iy = β 0 + [ z iy , v i, − y ] ′ β + ε i (forward regression) ◮ d + p − 1 dimensional regression reduced to p + 1 dimensional! ◮ z iy summarizes all textual information relevant for prediction
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Why would we need a hurdle? Full Range Positive Range 1200 50 1000 40 Mean across phrases Mean across phrases 800 30 600 20 400 10 200 0 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Counts of phrase j across documents Counts of phrase j across documents Wall Street Journal, monthly front page text, July 1926 to February 2016 ◮ Statistics: hurdle better describes text data ◮ Text data often has many more zeros than predicted by Poisson ◮ Economics: text is selected ◮ Publishers cater to a boundedly rational reader (Gabaix, 2014) ◮ Politicians select phrases that resonate with voters (Gentzkow-Shapiro-Taddy, 2017) ◮ Censored or socially taboo words (Michel et al, 2011) ◮ Fixed cost of introducing new terms, low marginal cost
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Why would we need a hurdle? Full Range Positive Range 1200 50 1000 40 Mean across phrases Mean across phrases 800 30 600 20 400 10 200 0 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Counts of phrase j across documents Counts of phrase j across documents Wall Street Journal, monthly front page text, July 1926 to February 2016 ◮ Statistics: hurdle better describes text data ◮ Text data often has many more zeros than predicted by Poisson ◮ Economics: text is selected ◮ Publishers cater to a boundedly rational reader (Gabaix, 2014) ◮ Politicians select phrases that resonate with voters (Gentzkow-Shapiro-Taddy, 2017) ◮ Censored or socially taboo words (Michel et al, 2011) ◮ Fixed cost of introducing new terms, low marginal cost
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Why would we need a hurdle? Full Range Positive Range 1200 50 1000 40 Mean across phrases Mean across phrases 800 30 600 20 400 10 200 0 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Counts of phrase j across documents Counts of phrase j across documents Wall Street Journal, monthly front page text, July 1926 to February 2016 ◮ Statistics: hurdle better describes text data ◮ Text data often has many more zeros than predicted by Poisson ◮ Economics: text is selected ◮ Publishers cater to a boundedly rational reader (Gabaix, 2014) ◮ Politicians select phrases that resonate with voters (Gentzkow-Shapiro-Taddy, 2017) ◮ Censored or socially taboo words (Michel et al, 2011) ◮ Fixed cost of introducing new terms, low marginal cost
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text selection model With sparse text, extensive margin may be more informative than intensive margin ◮ We suggest a text selection model instead 1. Two part text selection model for counts h ∗ i = f ( κ j + w ′ i δ j ) + ω i (Inclusion) � � c ∗ i = λ α j + v ′ i ϕ j + υ i (Repetition) i × 1 ( h ∗ c i = c ∗ i > 0) = c ∗ (Observation) i × h i 2. Construct two low dimensional projections into v iy (= w iy ) iy ≡ � iy ≡ � j ˆ z + z 0 j ˆ δ jy h ij ϕ jy c ij (SR projections) Inclusion Repetition 3. Regress target variable on z + iy , z 0 iy and other covariates � � ′ β + ε i z 0 iy , z + v iy = β 0 + (forward regression) iy , w i, − y , v i, − y ◮ d + p − 1 dimensional regression reduced to p + 2 dimensional!
Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text selection model With sparse text, extensive margin may be more informative than intensive margin ◮ We suggest a text selection model instead 1. Two part text selection model for counts h ∗ i = f ( κ j + w ′ i δ j ) + ω i (Inclusion) � � c ∗ i = λ α j + v ′ i ϕ j + υ i (Repetition) i × 1 ( h ∗ c i = c ∗ i > 0) = c ∗ (Observation) i × h i 2. Construct two low dimensional projections into v iy (= w iy ) iy ≡ � iy ≡ � j ˆ z + z 0 j ˆ δ jy h ij ϕ jy c ij (SR projections) Inclusion Repetition 3. Regress target variable on z + iy , z 0 iy and other covariates � � ′ β + ε i z 0 iy , z + v iy = β 0 + (forward regression) iy , w i, − y , v i, − y ◮ d + p − 1 dimensional regression reduced to p + 2 dimensional!
Recommend
More recommend