1/20 STRUCTURED LOW-RANK MATRIX FACTORIZATION: GLOBAL OPTIMALITY, ALGORITHMS, AND APPLICATIONS ARTICLE BY BENJAMIN D. HAEFFELE AND RENÉ VIDAL (2017) CMAP Machine Learning Journal Club Speaker: Imke Mayer , December 13 th 2018 CMAP
2/20 OUTLINE Structured Matrix Factorization I. Context and definition i. Special case 1: Sparse dictionary learning (SDL) ii. Special case 2: Subspace clustering (SC) iii. Global optimality for structured matrix factorization II. Main theorem i. Polar problem ii. III. Application: SDL global optimality IV. Extension to tensor factorization and deep learning CMAP Machine Learning Journal Club, December 13 th 2018
3/20 STRUCTURED MATRIX FACTORIZATION CONTEXT (Large) high-dimensional datasets (images, videos, user ratings, etc.) Ø difficult to assess (computational issues, memory complexity) Ø but relevant information often lies in a low-dimensional structure Goal: recover this underlying low-dimensional structure of given (large scale) data X Motion segmentation Face clustering CMAP Machine Learning Journal Club, December 13 th 2018 [12] VIDAL, R., MA, Y., AND SASTRY, S. S. Generalized principal component analysis , vol. 5. Springer, 2016.
4/20 STRUCTURED MATRIX FACTORIZATION CONTEXT Large high-dimensional datasets (images, videos, user ratings, etc.) Ø difficult to assess (computational issues, memory complexity) Ø but relevant information often lies in general low-dimensional structure Goal: recover this underlying low-dimensional structure of given (large scale) data X Model assumption: linear subspace model. T he data can be approximated by one ore more low-dimensional subspace(s). X ⇡ UV T Basis of the linear low- Low-dimensional data dimensional structure representation CMAP Machine Learning Journal Club, December 13 th 2018
4/20 STRUCTURED MATRIX FACTORIZATION CONTEXT X ⇡ UV T Basis of the linear low- Low-dimensional data dimensional structure representation Ø Issue: Without any assumptions there are infinitely many choices for U and V such that X ≈ UV T . Ø Solution: Constrain the factors to satisfy certain properties. ` ( X, UV T ) + � Θ ( U, V ) min ( (1) U,V Ø Non-convex Ø Structured factors → more modeling flexibility Loss : Regularization : Ø Explicit representation measures the imposes restrictions approximation on the factors CMAP Machine Learning Journal Club, December 13 th 2018
5/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 1: SPARSE DICTIONARY LEARNING Given a set of signals, find a set of dictionary atoms and sparse codes to approximate the signals. [9] Ø denoising, inpainting Ø classification Sparse linear combinations = of dictionary atoms Denoised image Noisy image Dictionary atoms dictionary signals k X � UV T k 2 min F + � k V k 1 subject to k U i k 2 1 ( (3) U,V sparse codes CMAP Machine Learning Journal Club, December 13 th 2018 [9] OLSHAUSEN, B. A., AND FIELD, D. J. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research 37 , 23 (1997), 3311–3325.
6/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 1: SPARSE DICTIONARY LEARNING dictionary signals k X � UV T k 2 min F + � k V k 1 subject to k U i k 2 1 ( (3) U,V sparse codes Challenges: Ø Optimization strategies without global convergence guarantees Ø Which size for U and V? Need to pick r (number of columns) a priori r X k X � UV T k 2 min F + � � k V i k 1 + (1 � � ) k V i k 2 subject to k U i k 2 1 ( (4) U,V,r i =1 CMAP Machine Learning Journal Club, December 13 th 2018
7/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 2: SUBSPACE CLUSTERING Given data X coming from a union of subspaces, find these underlying subspaces and separate the data according to these subspaces. Ø clustering Ø recover low-dimensional structures CMAP Machine Learning Journal Club, December 13 th 2018
8/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 2: SUBSPACE CLUSTERING Given data X coming from a union of subspaces, determine these underlying subspaces and separate the data according to these subspaces. Ø clustering Ø recover low-dimensional structures Subspaces S 1 ,..., S n Segmentation by finding a subspace- characterized by bases → U preserving representation → V recover number and recover data dimensions of the segmentation subspaces Challenges: Ø Model selection: how many subspaces? Dimension of each subspace? Ø Potentially: difficult subspace configurations CMAP Machine Learning Journal Club, December 13 th 2018
9/20 STRUCTURED MATRIX FACTORIZATION SPECIAL CASE 2: SUBSPACE CLUSTERING One solution to do subspace clustering: Sparse Subspace Clustering [4] • Self-expressive dictionary: fix the dictionary as U ← X • Find a sparse representation over U which allows to segment the data. But optimality of the dictionary is not addressed. Idea: Sparse dictionary learning on union of subspaces model is suited to recover a more compact factorization with subspace-sparse codes. [1] [1] ADLER, A., ELAD, M., AND HEL-OR, Y. Linear-time subspace clustering via bipartite graph modeling. IEEE transactions on neural networks and learning systems 26 , 10 (2015), 2234–2246. [4] ELHAMIFAR, E., AND VIDAL, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence 35 , 11 (2013), 2765–2781.
10/20 STRUCTURED MATRIX FACTORIZATION THEORY FOR GLOBAL OPTIMALITY Matrix factorization Matrix approximation ? (1) ` ( X, UV T ) + � Θ ( U, V ) (2) min ` ( X, Y ) + � Ω Θ ( Y ) ( min ( Y U,V Ø Non-convex Ø Convex Ø Small problem size Ø Large problem size Ø Structured factors → more modeling flexibility Ø Unstructured Ø Explicit representation ` ( X, UV T ) min ` ( X, Y ) + � k Y k ⇤ min subject to U, V have number of columns r U,V Y Low-rank matrix factorization Low-rank matrix approximation CMAP Machine Learning Journal Club, December 13 th 2018
10/20 STRUCTURED MATRIX FACTORIZATION THEORY FOR GLOBAL OPTIMALITY Matrix factorization Matrix approximation ? (1) ` ( X, UV T ) + � Θ ( U, V ) (2) min ` ( X, Y ) + � Ω Θ ( Y ) ( min ( Y U,V Ø Non-convex Ø Convex Ø Small problem size Ø Large problem size Ø Structured factors → more modeling flexibility Ø Unstructured Ø Explicit representation X Ideas: Find a convex relaxation for general a regularization function to couple the two problems (1) and (2) . • Ω Θ Θ • Allow the number of columns of U and V to change in (1) . Results: • Problem (2) gives a global lower-bound to problem (1) . This convex lower-bound allows to analyze global optimality for problem (1) . •
11/20 GLOBAL OPTIMALITY OF STRUCTURED MATRIX FACTORIZATION AT A LOCAL MINIMUM f ( U, V ) = ` ( X, UV T ) + � Θ ( U, V ) min ( U,V Assumptions: • Factorization size r is allowed to change. X • Loss is convex and once differentiable w.r.t. Y . ` ( X, Y ) + • is a sum of positively homogeneous functions of degree 2. Θ r X θ ( α u, α v ) = α 2 θ ( u, v ) Θ ( U, V ) = θ ( U i , V i ) for all α � 0 i =1 THEOREM [6] ( ˜ U, ˜ ( ˜ U i , ˜ Local minima of are globally optimal if for some . f ( U, V ) i 2 [ r ] V ) V i ) = (0 , 0) All local minima of of sufficient size are global minima. f ( U, V ) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
12/20 GLOBAL OPTIMALITY OF STRUCTURED MATRIX FACTORIZATION AT ANY POINT f ( U, V ) = ` ( X, UV T ) + � Θ ( U, V ) min ( U,V Assumptions: • Factorization size r is allowed to change. X • Loss is convex and once differentiable w.r.t. Y . ` ( X, Y ) + • is a sum of positively homogeneous functions of degree 2. Θ r X θ ( α u, α v ) = α 2 θ ( u, v ) Θ ( U, V ) = θ ( U i , V i ) for all α � 0 i =1 COROLLARY [6] ( ˜ U, ˜ A point is a global optimum of if it satisfies the following conditions: f ( U, V ) V ) ✓ ◆ � 1 ˜ � r Y ` ( X, ˜ U ˜ V i = ✓ ( ˜ ˜ U i , ˜ 1) U T V T ) V i ) for all i 2 [ r ] ← for many choices of " condition 1 is satisfied by first order optimal points i ✓ ◆ � 1 2) � r Y ` ( X, ˜ U ˜ u T V T ) v ✓ ( u, v ) for all ( u, v ) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
12/20 GLOBAL OPTIMALITY OF STRUCTURED MATRIX FACTORIZATION AT ANY POINT f ( U, V ) = ` ( X, UV T ) + � Θ ( U, V ) min ( U,V Assumptions: • Factorization size r is allowed to change. X • Loss is convex and once differentiable w.r.t. Y . ` ( X, Y ) + • is a sum of positively homogeneous functions of degree 2. Θ r X θ ( α u, α v ) = α 2 θ ( u, v ) Θ ( U, V ) = θ ( U i , V i ) for all α � 0 i =1 COROLLARY [6] ( ˜ U, ˜ Given a point we can test whether it is a local minimum and of sufficient size V ) by testing: ✓ ◆ � 1 � r Y ` ( X, ˜ U ˜ (5) u T V T ) v ✓ ( u, v ) for all ( u, v ) [6] HAEFFELE, B. D., AND VIDAL, R. Structured low-rank matrix factorization: Global optimality, algorithms, and applications. arXiv preprint arXiv:1708.07850 (2017).
Recommend
More recommend