Uncertainty quantification for nonconvex tensor completion Yuxin Chen Electrical Engineering, Princeton University
Changxiao Cai H. Vincent Poor Princeton EE Princeton EE
Ubiquity of high-dimensional tensor data computational genomics dynamic MRI — fig. credit: Schreiber et al. 19 — fig. credit: Liu et al. 17 3/ 21
Challenges in tensor reconstruction a tensor of interest 4/ 21
Challenges in tensor reconstruction a tensor of interest mising data 4/ 21
Challenges in tensor reconstruction a tensor of interest mising data noise 4/ 21
Key to enabling reliable reconstruction from incomplete & noisy data: — exploiting low (CP) rank structure 5/ 21
Noisy tensor completion 6/ 21
Mathematical model T obs T ⋆ • unknown rank- r tensor T ⋆ ∈ R d × d × d r T ⋆ = � u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i i =1 7/ 21
Mathematical model T obs T ⋆ • unknown rank- r tensor T ⋆ ∈ R d × d × d r T ⋆ = � u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i i =1 • partial observations over a sampling set Ω T obs i,j,k = T ⋆ i,j,k + noise , ( i, j, k ) ∈ Ω 7/ 21
Mathematical model T obs T ⋆ • unknown rank- r tensor T ⋆ ∈ R d × d × d r T ⋆ = � u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i i =1 • partial observations over a sampling set Ω T obs i,j,k = T ⋆ i,j,k + noise , ( i, j, k ) ∈ Ω • goal: estimate { u ⋆ i } r i =1 and T ⋆ 7/ 21
Prior art sum-of-squares hierarchy convex relaxation spectral methods nonconvex optimization 8/ 21
Prior art • Gandy, Recht, Yamada ’11 • Liu, Musialski, Wonka, Ye ’12 • Kressner, Steinlechner, Vandereycken ’13 • Xu, Hao, Yin, Su ’13 • Romera-Paredes, Pontil ’13 • Jain, Oh ’14 • Huang, Mu, Goldfarb, Wright ’15 • Barak, Moitra ’16 • Zhang, Aeron ’16 • Yuan, Zhang ’16 • Montanari, Sun ’16 • Kasai, Mishra ’16 • Potechin, Steurer ’17 • Dong, Yuan, Zhang ’17 • Xia, Yuan ’19 • Zhang ’19 • Cai, Li, Poor, Chen ’19 • Cai, Li, Chi, Poor, Chen ’19 • Liu, Moitra ’20 • . . . 9/ 21
A nonconvex approach: Cai et al. (NeurIPS 19) � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss 1. estimating subspace spanned by low-rank tensor factors — unfolding + spectral methods — iteratively each tensor factor via • proper initializaiton: U 0 2. successive retrieval of tensor factors • gradient descent: for t = 0 , 1 , · · · from subspace estimates — iteratively each tensor factor via — random projection + spectral methods U t +1 = U t − η t ∇ f ( U t ) 3. gradient descent (nonconvex) — random projection + sp — constant learning rates 10/ 21
A nonconvex approach: Cai et al. (NeurIPS 19) 10 -1 10 -2 10 -3 0 5 10 15 20 25 30 Under mild conditions, this nonconvex algorithm achieves • linear convergence • minimax-optimal statistical accuracy (up to log factor) 11/ 21
One step further: reasoning about uncertainty? tensor c sor completion How to to assess unce 12/ 21
One step further: reasoning about uncertainty? tensor c sor completion How to to assess unce How to assess uncertainty, or “confidence”, of obtained estimates due to imperfect data acquisition? • noise • incomplete measurements • · · · 12/ 21
Challenges � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss • how to pin down distributions of nonconvex solutions? 13/ 21
Challenges � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss • how to pin down distributions of nonconvex solutions? • how to adapt to unknown noise distributions and heteroscedasticity (i.e. location-varying noise variance)? 13/ 21
Challenges � 2 �� � r � � s =1 u ⊗ 3 i,j,k − T obs U =[ u 1 , ··· , u r ] ∈ R d × r f ( U ) := minimize s i,j,k ( i,j,k ) ∈ Ω � �� � squared loss • how to pin down distributions of nonconvex solutions? • how to adapt to unknown noise distributions and heteroscedasticity (i.e. location-varying noise variance)? • existing estimation guarantees are highly insufficient − → overly wide confidence intervals 13/ 21
Assumptions r � T ⋆ = u ⋆ i ⊗ u ⋆ i ⊗ u ⋆ i ∈ R d × d × d i =1 • random sampling : each entry is observed independently with prob. p � polylog( d ) d 3 / 2 • random noise : independent zero-mean sub-Gaussian with variance of roughly the same order (but not identical) • ground truth : low-rank ( r = O (1) ), incoherent (tensor factors are de-localized and nearly orthogonal to each other), and well-conditioned 14/ 21
Main results: distributional theory 3 U 2 • random sampling 1 • independent sub-Gaussian noise 0 -1 • ground truth: low-rank, incoherent, -2 well-conditioned -3 -3 -2 -1 0 1 2 3 Theorem 1 With high prob., there exists permutation matrix Π ∈ R r × r s.t. U Π − U ⋆ ∼ N ( 0 , Cram´ er-Rao ) + negligible term — asymptotically optimal 15/ 21
Main results: distributional theory 3 T 2 • random sampling 1 • independent sub-Gaussian noise 0 -1 • ground truth: low-rank, incoherent, -2 well-conditioned -3 -3 -2 -1 0 1 2 3 Theorem 2 Consider any ( i, j, k ) s.t. the corresponding “SNR” is not exceedingly small. Then with high prob., T i,j,k − T ⋆ i,j,k ∼ N (0 , Cram´ er-Rao ) + negligible term — asymptotically optimal 15/ 21
• Gaussianality and optimality: estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal” 16/ 21
0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 tensor factor entry tensor entry • Gaussianality and optimality: estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal” • Confidence intervals: error (co)-variance can be accurately estimated, leading to valid CI construction 16/ 21
0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 tensor factor entry tensor entry • Gaussianality and optimality: estimation error of nonconvex approach is zero-mean Gaussian, who (co)-variance is “minimal” • Confidence intervals: error (co)-variance can be accurately estimated, leading to valid CI construction • Adaptivity: our procedure is data-driven, fully adaptive to unknown noise levels and heteroscedasticity 16/ 21
Empirical coverage rates (CR) tensor factor tensor entries ( r, σ ) Mean ( CR ) Std ( CR ) ( r, σ ) Mean ( CR ) Std ( CR ) (2 , 10 − 2 ) (2 , 10 − 2 ) 0 . 9481 0 . 0201 0 . 9494 0 . 0218 (2 , 10 − 1 ) 0 . 9477 0 . 0228 (2 , 10 − 1 ) 0 . 9513 0 . 0218 (2 , 1) 0 . 9478 0 . 0215 (2 , 1) 0 . 9475 0 . 0222 (4 , 10 − 2 ) (4 , 10 − 2 ) 0 . 9450 0 . 0218 0 . 9434 0 . 0225 (4 , 10 − 1 ) 0 . 9472 0 . 0231 (4 , 10 − 1 ) 0 . 9494 0 . 0220 (4 , 1) 0 . 9462 0 . 0234 (4 , 1) 0 . 9494 0 . 0219 d = 100 , p = 0 . 2 , heteroscedastic 17/ 21
Back to estimation: ℓ 2 optimality Distributional theory in turn allows us to track estimation accuracy 18/ 21
Back to estimation: ℓ 2 optimality Distributional theory in turn allows us to track estimation accuracy Theorem 3 Suppose noise is i.i.d. Gaussian. ∃ some permutation π ( · ) s.t. (2 + o (1)) σ 2 d � u π ( l ) − u ⋆ l � 2 2 = , 1 ≤ l ≤ r p � u ⋆ l � 4 2 � �� � Cram er-Rao lower bound (6 + o (1)) σ 2 rd � T − T ⋆ � 2 F = p � �� � Cram´ er-Rao lower bound 18/ 21
Back to estimation: ℓ 2 optimality Distributional theory in turn allows us to track estimation accuracy Theorem 3 Suppose noise is i.i.d. Gaussian. ∃ some permutation π ( · ) s.t. (2 + o (1)) σ 2 d � u π ( l ) − u ⋆ l � 2 2 = , 1 ≤ l ≤ r p � u ⋆ l � 4 2 � �� � Cram er-Rao lower bound (6 + o (1)) σ 2 rd � T − T ⋆ � 2 F = p � �� � Cram´ er-Rao lower bound • precise characterization of estimation accuracy • achieves full statistical efficiency (including pre-constant) 18/ 21
Numerical ℓ 2 errors vs. Cram´ er–Rao bounds 10 -5 10 1 10 -6 10 0 10 -7 10 -1 10 -8 10 -2 10 -3 10 -2 10 -1 10 0 10 -3 10 -2 10 -1 10 0 tensor factor estimation tensor estimation r = 4 , p = 0 . 2 , d = 100 19/ 21
Concluding remarks sor estimation ar-optimal s lity guarantees • ion nonconvex op • fast, adaptive to unknown noise levels ex optimization a nd uncertainty qu lly optimal u al uncertainty quantification 20/ 21
Concluding remarks sor estimation ar-optimal s lity guarantees • ion nonconvex op • fast, adaptive to unknown noise levels ex optimization a nd uncertainty qu lly optimal u al uncertainty quantification future directions • improve dependency on rank & cond. number • more general sampling patterns • other tensor-type problems 20/ 21
Recommend
More recommend