low rank interaction with sparse additive effects model
play

Low-rank Interaction with Sparse Additive Effects Model for Large - PowerPoint PPT Presentation

Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames Genevive Robin 1 , Hoi-To Wai 2 , Julie Josse 1 , Olga Klopp 3 , ric Moulines 1 1 cole Polytechnique, 2 University of Hong Kong, 3 ESSEC Business School December


  1. Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames Geneviève Robin 1 , Hoi-To Wai 2 , Julie Josse 1 , Olga Klopp 3 , Éric Moulines 1 1 École Polytechnique, 2 University of Hong Kong, 3 ESSEC Business School December 6. 2018 Poster #87 210 & 230 AB Thirty-second Conference on Neural Information 5-7pm Processing Systems

  2. Motivation: species monitoring Waterbirds counts Sites and year covariates 2008 2009 2010 Site Surface Country Latitude Year Spring N/O Spring N/E Winter S/O site 1 NA 0.35 Algeria 16 32 1 36.64 2008 0,499 1,672 0,505 site 2 299 286 346 15.4 Tunisia 2 34.11 Y U 2009 0,175 2,527 0,215 site 3 NA 96 151 1.12 Lybia 3 35.75 White headed duck: endangered site 4 NA NA NA 2010 0,36 -1,453 0,290 0.34 Morocco 4 35.56 • lead poisoning site 5 NA NA NA 2.8 Algeria 5 34.49 • wetland loss site 6 4647 6054 2442 2.6 Algeria 6 35.91 site 7 16 45 30 0.98 Tunisia 7 35.75 site 8 5916 6485 1249 7.2 Morocco 8 30.36 1) Characteristics of the data 2) Goal: estimate • Main e ff ects : e ff ect of covariates • Mixed : categorical, real and discrete Eurasian curlew: declining • Interactions : the remaining e ff ects • Large scale : 25,000+ survey sites • lead poisoning • Incomplete : missing values • habitat destruction • Side information : row & column covariates • disturbances

  3. Low-rank Interaction and Sparse main effects parameter (unknown) Heterogeneous f Y ij ( y ) = f ij ( y, X ij ) exponential family parametric model: depends on the entry q Main e ff ects and α k U k + L X X ij = h u ij , α i + L ij X = interactions in parameter space: regression k =1 “residual” term sparse regression low-rank on dictionary design α , ˆ Estimation: (ˆ L ) 2 argmin L ( Y ; X ) + λ 1 k L k ? + λ 2 k α k 1 Two-fold generalisation of 1. general sparsity pattern “sparse plus low-rank” 2. exponential family noise matrix recovery

  4. Statistical guarantees Convergence results α , ˆ Mixed Coordinate Gradient Descent Algorithm : (ˆ L ) 2 argmin L ( Y ; X ) + λ 1 k L k ? + λ 2 k α k 1 • proximal update for α • conditional gradient/Franke-Wolfe update for L Near optimal error bounds for main e ff ects and interactions Theorem 1: Sublinear convergence and computationally e ffi cient � � α 0 � ⇥ max k k U ( k ) k 1 � 2 � α � α 0 � � 1 � ˆ 2  + D α Theorem 2: κ 2 π The MCGD method converges to an F  rank( L 0 ) max( n, p ) k ˆ L � L 0 k 2 + D L - solution in iterations O (1 / ✏ ) ✏ π

  5. 200 600 � 2 LORIS LORIS + � α − α 0 � + k ˆ L � L 0 k 2 + MIEL MIEL Time (s) � ˆ 2500 Two-step o Two-step F o o group mean + svd group mean + svd o 2 500 150 2000 400 running time (s) 1500 100 300 1000 200 o 50 + 500 100 o o o + o + o + + + + 0 0 0 0e+00 1e+07 2e+07 3e+07 4e+07 0e+00 1e+07 2e+07 3e+07 4e+07 0e+00 1e+07 2e+07 3e+07 4e+07 size of data frame 0.25 Imputation error Fast in large dimensions 0.20 method 0.15 Relative RMSE Estimation of main e ff ects CA GLMM S LORI constant with dimensions MEAN 0.10 TRIM Robust to large proportions 0.05 of missing values 0.00 10 20 30 40 50 60 70 80 Percentage of missing values

  6. Poster #87 210 & 230 AB 5-7pm

Recommend


More recommend