Maximum Likelihood Matrix Completion Under Sparse Factor Models: Error Guarantees and Efficient Algorithms Jarvis Haupt Department of Electrical and Computer Engineering University of Minnesota Institute for Computational and Experimental Research in Mathematics (ICERM) Workshop on Approximation, Integration, and Optimization October 1, 2014
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Section 1 Background and Motivation
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A Classical Example Sampling Theorem: (Whittaker/Kotelnikov/Nyquist/Shannon, 1930’s-1950’s) Original Signal (Red) Samples (Black) Basic “Formula” for Inference: To draw inferences from limited data (or here, to impute missing elements), need to leverage underlying structure in the signal being inferred. Accurate Recovery (and Imputation) via Ideal Low-Pass Filtering when Original Signal is Bandlimited
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A Contemporary Example Matrix Completion: (Candes & Recht; Keshavan, et al.; Candes & Tao; Candes & Plan; Negahban & Wainwright; Koltchinskii et al.; Davenport et al.;... 2009- ) Low-rank modeling assumption commonly utilized in collaborative filtering applications (e.g. the Netflix prize), Samples to describe settings where each observed value depends on only a few latent factors or features. Accurate Recovery (and Imputation) via Convex Optimization when Original Matrix is Low-Rank
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Beyond Low Rank Models? Low-Rank Models: All columns of the ma- Union of Subspaces Model: All columns of trix are well-approximated as vectors in the matrix are well-approximated as vectors common linear subspace. in a union of linear subspaces. Union of subspaces models are at the essence of sparse subspace clustering (Elhamifar & Vidal; Soltanolkotabi et al.; Erikkson et al; Balzano et al) and dictionary learning (Olshausen & Field; Aharon et al; Mairal et al.;...) . Here, we examine the efficacy of such models in matrix completion tasks.
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Section 2 Problem Statement
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments “Sparse Factor” Data Models We assume the unknown X ∗ ∈ R n 1 × n 2 we seek to estimate admits a factorization of the form X ∗ = D ∗ A ∗ , D ∗ ∈ R n 1 × r , A ∗ ∈ R r × n 2 where • � D ∗ � max � max i , j | D i , j | ≤ 1 (essentially to fix scaling ambiguities) • � A ∗ � max ≤ A max for a constant 0 < A max ≤ ( n 1 ∨ n 2 ) • � X ∗ � max ≤ X max / 2 for a constant X max ≥ 1
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments “Sparse Factor” Data Models We assume the unknown X ∗ ∈ R n 1 × n 2 we seek to estimate admits a factorization of the form X ∗ = D ∗ A ∗ , D ∗ ∈ R n 1 × r , A ∗ ∈ R r × n 2 where • � D ∗ � max � max i , j | D i , j | ≤ 1 (essentially to fix scaling ambiguities) • � A ∗ � max ≤ A max for a constant 0 < A max ≤ ( n 1 ∨ n 2 ) • � X ∗ � max ≤ X max / 2 for a constant X max ≥ 1 Our Focus: Sparse factor models , characterized by (approximately or exactly) sparse A ∗ .
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Observation Model We observe X ∗ only at a subset S ∈ { 1 , 2 , . . . , n 1 } × { 1 , 2 , . . . , n 2 } of its locations. For some γ ∈ (0 , 1] each ( i , j ) is in S independently with probability γ , and interpret γ = m ( n 1 n 2 ) − 1 , so that m = is the nominal number of observations. Observations { Y i , j } ( i , j ) ∈S � Y S conditionally independent given S , modeled via joint density � p X ∗ S ( Y S ) = p X ∗ i , j ( Y i , j ) � �� � ( i , j ) ∈S scalar densities
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Estimation Approach We estimate X ∗ via a sparsity-penalized maximum likelihood approach: for λ > 0, we take � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 . X = DA ∈X
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Estimation Approach We estimate X ∗ via a sparsity-penalized maximum likelihood approach: for λ > 0, we take � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 . X = DA ∈X The set X of candidate reconstructions is any subset of X ′ , where X ′ � { X = DA : D ∈ D , A ∈ A , � X � max ≤ X max } , where • D : the set of all matrices D ∈ R n 1 × r whose elements are discretized to one of L uniformly-spaced values in the range [ − 1 , 1] • A : the set of all matrices A ∈ R r × n 2 whose elements either take the value zero, or are discretized to one of L uniformly-spaced values in the range [ − A max , A max ]
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Section 3 Error Bounds
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A General “Sparse Factor” Matrix Completion Error Guarantee Theorem (A. Soni, S. Jain, J.H., and S. Gonella, 2014) Let β > 0 and set L = ( n 1 ∨ n 2 ) β . If C D satisfies C D ≥ max X ∈X max i , j D ( p X ∗ i , j � p X i , j ), then for � � 1 + 2 C D any λ ≥ 2 · ( β + 2) · · log( n 1 ∨ n 2 ), the sparsity penalized ML estimate 3 � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 X = DA ∈X satisfies the (normalized, per-element) error bound � � − 2 log A ( p � X , p X ∗ ) E S , Y S ≤ 8 C D log m n 1 n 2 m � � n 1 p + � A � 0 � D ( p X ∗ � p X ) � �� λ + 4 C D ( β + 2) log( n 1 ∨ n 2 ) +3 min + . n 1 n 2 3 m X = DA ∈X Here: �� � A ( p X , p X ∗ ) � � i , j ) � E pX ∗ i , j A ( p Xi , j , p X ∗ i , j ) where A ( p Xi , j , p X ∗ p Xi , j / p X ∗ is the Hellinger Affinity i , j i , j � � D ( p X ∗ � p X ) � � i , j � p Xi , j ) � E pX ∗ i , j D ( p X ∗ i , j � p Xi , j ) where D ( p X ∗ log( p X ∗ i , j / p Xi , j ) is KL Divergence i , j
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments A General “Sparse Factor” Matrix Completion Error Guarantee Theorem (A. Soni, S. Jain, J.H., and S. Gonella, 2014) Let β > 0 and set L = ( n 1 ∨ n 2 ) β . If C D satisfies C D ≥ max X ∈X max i , j D ( p X ∗ i , j � p X i , j ), then for � � 1 + 2 C D any λ ≥ 2 · ( β + 2) · · log( n 1 ∨ n 2 ), the sparsity penalized ML estimate 3 � � � X = arg min − log p X S ( Y S ) + λ · � A � 0 X = DA ∈X satisfies the (normalized, per-element) error bound � � − 2 log A ( p � X , p X ∗ ) E S , Y S ≤ 8 C D log m n 1 n 2 m � � n 1 p + � A � 0 � D ( p X ∗ � p X ) � �� λ + 4 C D ( β + 2) log( n 1 ∨ n 2 ) +3 min + . n 1 n 2 3 m X = DA ∈X Here: �� � A ( p X , p X ∗ ) � � i , j ) � E pX ∗ i , j A ( p Xi , j , p X ∗ i , j ) where A ( p Xi , j , p X ∗ p Xi , j / p X ∗ is the Hellinger Affinity i , j i , j � � D ( p X ∗ � p X ) � � i , j � p Xi , j ) � E pX ∗ i , j D ( p X ∗ i , j � p Xi , j ) where D ( p X ∗ log( p X ∗ i , j / p Xi , j ) is KL Divergence i , j Next, we instantiate this result for some specific cases (using a specific choice of β, λ ).
Background and Motivation Problem Statement Error Bounds Algorithmic Approach Experimental Results Acknowledgments Additive White Gaussian Noise Model Suppose each observation is corrupted by zero-mean AWGN with known variance σ 2 , so that � 1 − 1 ( Y i , j − X ∗ i , j ) 2 . p X ∗ S ( Y S ) = (2 πσ 2 ) |S| / 2 exp 2 σ 2 ( i , j ) ∈S Let X = X ′ , essentially (a discretization of) a set of rank and max-norm constrained matrices. Gaussian Noise (Exact Sparse Factor Model) If A ∗ is exactly sparse with � A ∗ � 0 nonzero elements, the sparsity penalized ML estimate satisfies � � � X ∗ − � � n 1 r + � A ∗ � 0 X � 2 � � � E S , Y S F ( σ 2 + X 2 = O max ) log( n 1 ∨ n 2 ) . n 1 n 2 m
Recommend
More recommend