median matrix completion from embarrassment to optimality
play

Median Matrix Completion: from Embarrassment to Optimality Xiaojun - PowerPoint PPT Presentation

Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science Fudan University, China June 15, 2020 Joint work with Dr. Weidong Liu (Shanghai Jiao Tong University, China) and Dr. Raymond K. W. Wong (Texas


  1. Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science Fudan University, China June 15, 2020 Joint work with Dr. Weidong Liu (Shanghai Jiao Tong University, China) and Dr. Raymond K. W. Wong (Texas A&M University, U.S.A.) Xiaojun Mao (FDU) MedianMC June 15, 2020 1 / 21

  2. Introduction 1 Estimations 2 Theoretical Guarantee 3 Experiments 4 Xiaojun Mao (FDU) MedianMC June 15, 2020 2 / 21

  3. Our Goal and Contributions Robust Matrix Comepletion (MC), allows heavy tails. Develop a robust and scalable estimator for median MC in large-scale problems. A fast and simple initial estimation via embarrassingly parallel computing. A refinement stage based on pseudo data. Theoretically, we show that this refinement stage can improve the convergence rate of the sub-optimal initial estimator to near-optimal order, as good as the computationally expensive median MC estimator. Xiaojun Mao (FDU) MedianMC June 15, 2020 3 / 21

  4. Background: The Netflix Problem = Y n 1 ≈ 480 K , n 2 ≈ 18 K . On average each viewer rated about 200 movies. Only 1 . 2% entries were observed. Goal: recover the true rating matrix A ⋆ . Xiaojun Mao (FDU) MedianMC June 15, 2020 4 / 21

  5. Robust Matrix Completion Low-rank-plus-sparse structure: A ⋆ + S + E . Median matrix completion: based on the absolute deviation loss. Under absolute deviation loss and the Huber loss, the convergence rates of Elsener and Geer (2018) match with Koltchinskii et al. (2011). Alquier et al. (2019) derives the minimax rates of convergence with any Lipschitz loss functions (absolute deviation loss). Xiaojun Mao (FDU) MedianMC June 15, 2020 5 / 21

  6. Introduction 1 Estimations 2 Theoretical Guarantee 3 Experiments 4 Xiaojun Mao (FDU) MedianMC June 15, 2020 6 / 21

  7. Trace Regression Model N independent pairs ( X k , Y k ), � � X T Y k = tr + ǫ k , k = 1 , . . . , N . (1) k A ⋆ The elements of ǫ = ( ǫ 1 , . . . , ǫ N ) are N i.i.d. random noise variables independent of the design matrices. The design matrices X k : X = { e j ( n 1 ) e k ( n 2 ) T : j = 1 , . . . , n 1 ; k = 1 , . . . , n 2 } , Xiaojun Mao (FDU) MedianMC June 15, 2020 7 / 21

  8. Regularized Least Absolute Deviation Estimator A ⋆ = ( A ⋆, ij ) n 1 , n 2 i , j =1 ∈ R n 1 × n 2 , P ( ǫ ≤ 0) = 0 . 5: A ⋆, ij is the median of Y | X . B ( a , n , m ) = { A ∈ R n × m : � A � ∞ ≤ a } and A ⋆ ∈ B ( a , n , m ). We use the absolute deviation loss: �� � � � � � Y − tr ( X T A ) A ⋆ = arg min . E � A ∈B ( a , n 1 , n 2 ) To encourage a low-rank solution, N � � � 1 � � � � Y k − tr ( X T � + λ ′ N � A � ∗ . A LADMC = arg min k A ) N A ∈B ( a , n 1 , n 2 ) k =1 Common computational strategies based on proximal gradient method inapplicable (Sum of two non-differentiable terms). Alquier et al. (2019) use ADMM, when the sample size and the matrix dimensions are large, slow and not scalable in practice. Xiaojun Mao (FDU) MedianMC June 15, 2020 8 / 21

  9. Distributed Initial Estimator Figure: An example of dividing a matrix into sub-matrices. � � � 1 � � Y k − tr ( X T � + λ N l , l � A l � ∗ . A LADMC , l = arg min l , k A l ) N l A l ∈B ( a , m 1 , m 2 ) k ∈ Ω l Xiaojun Mao (FDU) MedianMC June 15, 2020 9 / 21

  10. The Idea of Refinement L ( A ; { Y , X } ) = | Y − tr ( X T A ) | . The Newton-Raphson iteration: � � vec( A 1 ) = vec( � A 0 ) − H ( � A 0 ) − 1 E ( Y , X ) l ( � A 0 ; { Y , X } ) , where � A 0 is an initial estimator; l ( A ; { Y , X } ) is the sub-gradient and H ( A ) is the Hessian matrix. When � A 0 is close to the minimizer A ⋆ , vec( A 1 ) ≈ vec( � A 0 ) − [2 f (0) diag ( Π )] − 1 E ( Y , X ) [ l ( � A 0 ; { Y , X } )] � � � � � � − 1 Y ≤ tr ( X T � vec( � A 0 ) − [ f (0)] − 1 = E ( Y , X ) A 0 ) I 1 n 1 n 2 2 � Y 0 � vec( X ) ˜ = { E ( Y , X ) [vec( X )vec( X ) T ] } − 1 E ( Y , X ) where Π = ( π 1 , 1 , . . . , π n 1 , n 2 ) T , π st = Pr( X k = e s ( n 1 ) e T t ( n 2 )), and the theoretical pseudo data � � � � − 1 Y o = tr ( X T � Y ≤ tr ( X T � � A 0 ) − [ f (0)] − 1 A 0 ) . I 2 Xiaojun Mao (FDU) MedianMC June 15, 2020 10 / 21

  11. The First Iteration Refinement Details Y o − tr ( X T A ) } 2 . vec( A 1 ) ≈ arg min A E ( Y , X ) { � Choice of the initial estimator: � A 0 satisfies certain rate Condition. K ( x ): kernel function; h > 0: the bandwidth. � � N k � � Y k − tr ( X T f (0) = 1 A 0 ) � K . Nh h k =1 Let � Y = ( � Y k ), denote � � � � − 1 Y k = tr ( X T � k � A 0 ) − [ � f (0)] − 1 Y k ≤ tr ( X T k � A 0 ) . I 2 By using � Y , one natural estimator is given by � � 2 + λ N � A � ∗ . N � 1 � � Y k − tr ( X T A = arg min k A ) N A ∈B ( a , n 1 , n 2 ) k =1 Xiaojun Mao (FDU) MedianMC June 15, 2020 11 / 21

  12. The t -th Iteration Refinement Details Let h t → 0 is the bandwidth for the t -th iteration, � � N k � A ( t − 1) ) � Y k − tr ( X T 1 f ( t ) (0) = � . K Nh t h t k =1 Similarly, for each 1 ≤ k ≤ N , define � − 1 � � � � � − 1 f ( t ) (0) Y ( t ) � = tr ( X T k � A ( t − 1) ) − � Y k ≤ tr ( X T k � A ( t − 1) ) . I k 2 We propose the following estimator N � � 2 + λ N , t � A � ∗ . � 1 A ( t ) = Y ( t ) � � − tr ( X T arg min k A ) k N A ∈B ( a , n 1 , n 2 ) k =1 Xiaojun Mao (FDU) MedianMC June 15, 2020 12 / 21

  13. Introduction 1 Estimations 2 Theoretical Guarantee 3 Experiments 4 Xiaojun Mao (FDU) MedianMC June 15, 2020 13 / 21

  14. Notations n + = n 1 + n 2 , n max = max { n 1 , n 2 } and n min = min { n 1 , n 2 } . Denote r ⋆ = rank( A ⋆ ). In additional to some regular conditions, the initial estimator � A 0 satisfies ( n 1 n 2 ) − 1 / 2 � � A 0 − A ⋆ � F = O P (( n 1 n 2 ) − 1 / 2 a N ), where the initial rate ( n 1 n 2 ) − 1 / 2 a N = o (1). Denote the initial rate a N , 0 = a N and define that � � √ r ⋆ a N , 0 � 2 t r ⋆ ( n 1 n 2 ) n max log( n + ) + n min a N , t = . √ r ⋆ N n min Xiaojun Mao (FDU) MedianMC June 15, 2020 14 / 21

  15. Convergence Results of Repeated Refinement Estimator Theorem (Repeated refinement) Suppose that certain regular conditions hold and A ⋆ ∈ B ( a , n 1 , n 2 ) . By choosing h t and λ N , t to be certain orders, we have � � � �� ��� � � 2 �� A ( t ) − A ⋆ a 4 log( n + ) n max log( n + ) N , t − 1 F = O P max , r ⋆ + . n 2 n 1 n 2 N N min ( n 1 n 2 ) � � log( r 2 ⋆ n 2 max log( n + )) − log( n min N ) t ≥ log / log(2) , for some c 0 > 0 , c 0 log( r ⋆ a 2 N , 0 ) − 2 c 0 log( n min ) A ( t ) becomes r ⋆ n max N − 1 log( n + ) which is The convergence rate of � the near-optimal rate r ⋆ n max N − 1 upto a logarithmic factor. Under certain condition, t is of constant order. Xiaojun Mao (FDU) MedianMC June 15, 2020 15 / 21

  16. Introduction 1 Estimations 2 Theoretical Guarantee 3 Experiments 4 Xiaojun Mao (FDU) MedianMC June 15, 2020 16 / 21

  17. Synthetic Data Generation A ⋆ = UV T , where the entries of U ∈ R n 1 × r and V ∈ R n 2 × r were all drawn from N (0 , 1) independently. Set r = 3, chose n 1 = n 2 : 400, repeat 500 times. The missing rate was 0 . 2, we adopted the uniform missing mechanism. Four noise distributions: S1 Normal: ǫ ∼ N (0 , 1). S2 Cauchy: ǫ ∼ Cauchy(0 , 1). S3 Exponential: ǫ ∼ exp(1). S4 t-distribution with degree of freedom 1: ǫ ∼ t 1 . Cauchy distribution is a very heavy-tailed distribution and its first moment (expectation) does not exist. Xiaojun Mao (FDU) MedianMC June 15, 2020 17 / 21

  18. Comparison Methods (a) BLADMC: Blocked Least Absolute Deviation Matrix Completion � A LADMC , 0 . Number of row subsets l 1 = 2, number of column subsets l 2 = 2. (b) ACL: Least Absolute Deviation Matrix Completion with nuclear norm penalty based on the computationally expensive ADMM algorithm proposed by Alquier et al. (2019). c) MHT: The squared loss estimator with nuclear norm penalty proposed by Mazumder et al. (2010). Xiaojun Mao (FDU) MedianMC June 15, 2020 18 / 21

  19. Simulation Results for Noise Distribution S1 and S2 Table: The average RMSEs, MAEs, estimated ranks and their standard errors (in parentheses) of DLADMC, BLADMC, ACL and MHT. (T) DLADMC BLADMC RMSE 0.5920 (0.0091) 0.7660 (0.0086) MAE 0.4273 (0.0063) 0.5615 (0.006) S1(4) rank 52.90 (2.51) 400 (0.00) RMSE 0.9395 (0.0544) 1.7421 (0.3767) MAE 0.6735 (0.0339) 1.2061 (0.1570) S2(5) rank 36.49 (7.94) 272.25 (111.84) (T) ACL MHT RMSE 0.5518 (0.0081) 0.4607 (0.0070) MAE 0.4031 (0.0056) 0.3375 (0.0047) S1(4) rank 400 (0.00) 36.89 (1.79) RMSE 1.8236 (1.1486) 106.3660 (918.5790) MAE 1.2434 (0.5828) 1.4666 (2.2963) S2(5) rank 277.08 (170.99) 1.25 (0.50) Xiaojun Mao (FDU) MedianMC June 15, 2020 19 / 21

Recommend


More recommend