on robustness of principal component regression
play

On Robustness of Principal Component Regression Anish Agarwal - PowerPoint PPT Presentation

On Robustness of Principal Component Regression Anish Agarwal Devavrat Shah, Dennis Shen, Dogyoon Song MIT 1 What is PCR? 1 2 What is PCR? 1 3 What is PCR? 1 Step 1: PCA 4 What is PCR? 1 Step 1: PCA ( k -components) 5 What is PCR?


  1. On Robustness of Principal Component Regression Anish Agarwal Devavrat Shah, Dennis Shen, Dogyoon Song MIT 1

  2. What is PCR? 1 2

  3. What is PCR? 1 3

  4. What is PCR? 1 Step 1: PCA 4

  5. What is PCR? 1 Step 1: PCA ( k -components) 5

  6. What is PCR? 1 Step 2: Regression minimize 6

  7. What is PCR? 1 Step 3: Prediction 7

  8. When & Why Use PCR 2 8

  9. 2 Data Science Folklore “IF DATA IS (APPROXIMATELY) LOW-DIMENSIONAL, USE PCR!” -- -- An Anonymous Data ta Scienti tists ts Whe When n exactly sho should we be usi sing ng PC PCR? 9 -- LOREM IPSUM

  10. 2 Key Questions We Answer Theoretical properties of PCR? Is dimension-reduction only benefit to PCR? 10

  11. Our Theoretical Analysis of PCR helps answer following questions.. How low-rank do covariates need to be? How many principal components to pick? How well does PCR perform on a test data (i.e. generalization properties)? 11

  12. Is Dimension-Reduction Only Benefit? NO! 12 -- LOREM IPSUM

  13. 2 PCR (as is) works for a wide variety of settings! Noisy ? 0 Missing 3. 3.14 ? 1 Mixed valued ? ? Sensitive 13

  14. We We show PCR R is surprisingly ly robu bust to proble blems ms th that p t plague ue l larg rge-sca scale m modern rn d data tase sets ts Ma Main in Con ontrib ibut ution ion of of this is Wor ork 14 -- LOREM IPSUM

  15. Erro rror-In Vari ariab able Regre ression (S (Setti etting We e Consider) er) 15 -- LOREM IPSUM

  16. 2 Classical (high-dimensional) Regression 16

  17. 2 Error-in-Variable (EIV) Regression ? ? ? ? Representative of modern datasets 17

  18. 2 EIV - Surprising Number of Applications Time Series Analysis (measurement noise) Causal Inference (Synthetic Control) (measurement noise) Differentially-private Regression (noise by design) Mixed Valued Regression (structural noise) 18

  19. 2 EIV - Surprising Number of Applications Time Series Analysis (measurement noise) Causal Inference (Synthetic Control) (measurement noise) Differentially-private Regression (noise by design) Mixed Valued Regression (structural noise) 19

  20. Formal R Results 20 -- LOREM IPSUM

  21. 2 Theorem (Informal): Training Error If principal components chosen correctly (" = $) number of covariates PCR implicitly denoises covariates! fraction of observations OLS minmax error rate (low-dimensional, noiseless, fully observed covariates) 21

  22. 2 Theorem (Informal): Testing Error If principal components not chosen correctly (" ≠ $) Train Error with PCR (") Test Error PCR implicitly de-noises PCR implicitly performs covariates & ' -regularization Choose k that minimizes above 22

  23. 2 When To and Not to Use PCR? – Look at Spectrum Use PCR! Don’t Use PCR! Case 3 Magnitude of Case 1 Singular Values Singular Values (ordered by magnitude) Case 4 Case 2 23

  24. 2 Exponential-decaying spectrum is ubiquitous in real-world data GDP Trajectories (Macroeconomics) 24

  25. 2 Exponential-decaying spectrum is ubiquitous in real-world data Avito Ad-Click Dataset (E-Commerce) 25

  26. 2 Exponential-decaying spectrum is ubiquitous in real-world data Cricket Trajectories (Sports) 26

  27. Surprising Applications of PCR 3 27

  28. 3 Applications of Error-In-Variable Regression Time Series Analysis (measurement noise) Causal Inference (Synthetic Control) (measurement noise) Differentially-private Regression (noise by design) Mixed Valued Regression (structural noise) 28

  29. Da Data p privacy i is t top-of of-mind mind as s we we inc increasing singly apply ML on n se sensit nsitiv ive use ser data (gene netic ic data, purcha hase se hist history etc.) 29

  30. Standard N Notion o of P Priva vacy i in M ML ε -Differential P Priva vacy Intuitively, an algorithm is ε -differentially private if ou outcom ome of of a a more than ε due to stati tatisti tical al query ry on a database ca cannot ch change by mo pr presence/absence of any us user data record Example of Statistical Query: “ Average Income of all users between ages 25 and 30” 30

  31. hieve ε -di differ eren entially priva vacy? Ho How w to achie Laplace M Mechanism Laplacian N Noise ⁄ " # database 31

  32. Pr Predict ictiv ive Accu ccuracy cy vs. s. Pr Priv ivacy cy Tradeoff ff Ca Can n we achi hieve good prediction n error and nd still maint ntain n privacy? y? Yes! Ye 32

  33. Pr Predict ictiv ive Accu ccuracy cy vs. s. Pr Priv ivacy cy Tradeoff ff Can Ca n we achi hieve good prediction n error and nd still maint ntain n privacy? y? Step 1: Data Owner adds Laplacian Noise Step 2: Analyst Performs PCR Done! Don 33

  34. Wh What i t is s sample c complexity ty c cost f t for r ε - di differential p privacy? Prediction Error Do Does de de-no noising ising st step (PC PCA) break priv ivacy cy? No, PCA only de-noises covariates on average with respect to the - norm 34

  35. Conclusion 4 35

  36. Inspec In pect spec pectrum of yo your cova variate e matrix Magnitude of Case 1 Singular Values Singular Values Use PCR! (ordered by magnitude) de-noises Case 2 regularizes 36

  37. Po Possib ssible Implica icatio ions ns fo for Modern n ML Linear Case Non-Linear Case Step 1: Dimension Reduction PCA GANs? Li Linea ear l low-di dimens nsional nal covar ariat ate pre- Does non-linear covariate pre-processing proc processing has many implicit benefits (e.g. de- (e.g. GANs) have similar benefits for noising, regularizing) unstructured data? 37

  38. Co Come Me Meet Us s At Our Post ster #3 – East Exhibition Hall B + C, 5-7pm, Thursday Po Post ster #3 Shameless Plug Sh ug :) PCR for Time Series Analysis: ts tspd pdb.mit. t.edu PCR for Causal Inference: gi github.com/Rom Romcos os/SC SC_de demo 38

Recommend


More recommend