bilinear text regression and applications
play

Bilinear Text Regression and Applications Vasileios Lampos - PowerPoint PPT Presentation

Bilinear Text Regression and Applications Vasileios Lampos Department of Computer Science University College London May, 2014 1 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 1/45 Outline Linear Regression


  1. Bilinear Text Regression and Applications Vasileios Lampos Department of Computer Science University College London May, 2014 1 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 1/45

  2. Outline ⊥ Linear Regression Methods ⊣ Bilinear Regression Methods ⊣ Applications | = Conclusions 2 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 2/45

  3. Recap on regression methods 3 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 3/45

  4. Regression basics — Ordinary Least Squares (1/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ordinary Least Squares (OLS)   2 n m � � argmin  y i − β − x ij w j  w w,β w i =1 j =1 or in matrix form y � 2 argmin � X X X ∗ w w w ∗ − y y ℓ 2 , where X X X ∗ = [ X X diag ( I X I )] I w ∗ w w � − 1 X � X T X T ⇒ w w w ∗ = X X ∗ X X ∗ X X ∗ y y y 4 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 4/45

  5. Regression basics — Ordinary Least Squares (2/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ordinary Least Squares (OLS) � − 1 X � y � 2 X T X T argmin � X X X ∗ w w w ∗ − y y ℓ 2 ⇒ w w ∗ = w X X ∗ X X X ∗ X ∗ y y y w w w ∗ Why not? X T − − − X X ∗ X X X ∗ may be singular (thus difficult to invert) − − − high-dimensional models difficult to interpret − − − unsatisfactory prediction accuracy (estimates have large variance) 5 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 5/45

  6. Regression basics — Ridge Regression (1/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ridge Regression (RR) � � − 1 X X T X T w w ∗ = w X X ∗ X X X ∗ + λI I I X ∗ y y y (Hoerl & Kennard, 1970) � �� � non singular     2  n m m    � � � w 2 argmin  y i − β − x ij w j + λ  j   w w,β w  i =1 j =1 j =1  � � y � 2 w � 2 or argmin � X X ∗ w X w w ∗ − y y ℓ 2 + λ � w w ℓ 2 w ∗ w w 6 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 6/45

  7. Regression basics — Ridge Regression (2/2) x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ridge Regression (RR) � � y � 2 w � 2 argmin � X X X ∗ w w w ∗ − y y ℓ 2 + λ � w w ℓ 2 w w w ∗ + + + size constraint on the weight coefficients ( regularisation ) → resolves problems caused by collinear variables + + + less degrees of freedom, better predictive accuracy than OLS − − − does not perform feature selection (nonzero coefficients) 7 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 7/45

  8. Regression basics — Lasso x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w w ∗ = [ w w ; β ] w j ∈ { 1 , ..., m } ℓ 1 ℓ 1 ℓ 1 –norm regularisation or lasso (Tibshirani, 1996)     2 n m m     � � � argmin  y i − β − x ij w j + λ | w j |    w w w,β   i =1 j =1 j =1 � � y � 2 X w y w or argmin � X X ∗ w w ∗ − y ℓ 2 + λ � w w � ℓ 1 w ∗ w w − − − no closed form solution — quadratic programming problem + Least Angle Regression explores entire reg. path (Efron et al. , 2004) + + + w + + sparse w w , interpretability, better performance (Hastie et al. , 2009) − if m > n , at most n variables can be selected − − − − − strongly corr. predictors → model-inconsistent (Zhao & Yu, 2009) 8 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 8/45

  9. Regression basics — Lasso for Text Regression x i ∈ R m , • n-gram frequencies x x X — X X i ∈ { 1 , ..., n } • flu rates y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , j ∈ { 1 , ..., m } — w w ∗ = [ w w ; β ] ℓ 1 ℓ 1 ℓ 1 –norm regularisation or lasso � � y � 2 X w y w or argmin � X X ∗ w w ∗ − y ℓ 2 + λ � w w � ℓ 1 w w ∗ w ‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ... 150 HPA HPA 100 Inferred Inferred 100 Flu rate Flu rate A B C D E 50 50 0 0 180 200 220 240 260 280 300 320 340 0 10 20 30 40 50 60 70 80 90 Day Number (2009) Days Figure 1 : Flu rate predictions for the UK by applying lasso on Twitter data (Lampos & Cristianini, 2010) 9 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 9/45

  10. Regression basics — Elastic Net • observations x i ∈ R m , x x — X X X i ∈ { 1 , ..., n } • responses y y i ∈ R , — y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w w ∗ = [ w w w ; β ] j ∈ { 1 , ..., m } [ Linear ] Elastic Net (LEN) (Zhou & Hastie, 2005)       y � 2 w � 2 argmin � X X X ∗ w w w ∗ − y y + λ 1 � w w + λ 2 � w w � ℓ 1 w ℓ 2 ℓ 2 w w w ∗   � �� � � �� � � �� �   Lasso reg. OLS RR reg. + + + ‘compromise’ between ridge regression (handles collinear predictors) and lasso (favours sparsity) + + entire reg. path can be explored by modifying LAR + + + + if m > n , number of selected variables not limited to n − − − may select redundant variables! 10 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 10/45

  11. Would a slightly different text regression approach be more suitable for Social Media content? 11 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 11/45

  12. About Twitter (1/2) Tweet Examples @PaulLondon: I would strongly support a coalition government. It is the best thing for our country right now. #electionsUK2010 @JohnsonMP: Socialism is something forgotten in our country #supportLabour @FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP @JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!! Bieber rules, peace and love ♥ ♥ ♥ The Twitter basics : • 140 characters per status (tweet) • users follow and be followed • embedded usage of topics (#elections) • retweets ( RT ), @replies, @mentions, favourites • real-time nature • biased user demographics 12 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 12/45

  13. About Twitter (2/2) Tweet Examples @PaulLondon: I would strongly support a coalition government. It is the best thing for our country right now. #electionsUK2010 @JohnsonMP: Socialism is something forgotten in our country #supportLabour @FarageNOT: Far-right ‘movements’ come along with crises in capitalism #UKIP @JohnK 1999: RT @HannahB: Stop talking about politics and listen to Justin!! Bieber rules, peace and love ♥ ♥ ♥ • contains a vast amount of information about various topics • this information ( X X y X ) can be used to assist predictions ( y y ) (Lampos & Cristianini, 2012; Sakaki et al. , 2010; Bollen et al. , 2011) − X y − − f : X X → y y , f usually formulates a linear regression task − − − X X X represents word frequencies only... + is it possible to incorporate a user contribution somehow? + + word selection + + user selection + 13 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 13/45

  14. Bi-linear Text Regression 14 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 14/45

  15. Bilinear Text Regression — The general idea (1/2) x T Linear regression: f ( x x x i ) = x x i w w w + β x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w w ∗ = [ w w w ; β ] j ∈ { 1 , ..., m } u T Q Bilinear regression: f ( Q Q Q i ) = u u Q Q i w w w + β p ∈ Z + • users • observations Q i ∈ R p × m , Q Q — X X X i ∈ { 1 , ..., n } • responses y y i ∈ R , — y y i ∈ { 1 , ..., n } • weights, bias u k , w j , β ∈ R , — u u, w u w w, β k ∈ { 1 , ..., p } j ∈ { 1 , ..., m } 15 / 45 V. Lampos v.lampos@ucl.ac.uk Bilinear Text Regression and Applications 15/45

Recommend


More recommend