Fitting Convex Sets to Data via Matrix Factorization Yong Sheng Soh - PowerPoint PPT Presentation

Fitting Convex Sets to Data via Matrix Factorization Yong Sheng Soh LCCC Focus Period – May/June 2017 a California Institute of Technology Joint work with Venkat Chandrasekaran

Variational Approach to Inference Given data , fit model ( θ ) by solving arg min Loss ( θ ; data) + λ · Regularizer ( θ ) θ ◮ Loss: ensures fidelity to observed data ◮ Based on model of noise that has corrupted observations ◮ Regularizer: useful to induce desired structure in solution ◮ Based on prior knowledge, domain expertise

Example Denoise an image corrupted by noise ◮ Loss: Euclidean-norm ◮ Regularizer: L1-norm of wavelet coefficients ◮ Natural images are typically sparse in wavelet basis Photo: [Rudin, Osher, Fatemi]

Example Complete a partially filled survey Life is Goldfinger Big Shawshank Godfather Beautiful Lebowski Redemption Alice 5 4 ? ? ? Bob ? 4 1 4 ? Charlie ? 4 4 ? 5 Donna 4 ? ? 5 ? ◮ Loss: Euclidean / Logistic ◮ Regularizer: Nuclear-norm of user-preference matrix ◮ User-preference matrices often well-approximated as low-rank

This Talk ◮ Question: What if we do not have the domain expertise to design or select an appropriate regularizer for our task? ◮ E.g. domains with high-dimensional data comprising different data types ◮ Approach: Learn a suitable regularizer from example data ◮ E.g. Learn a suitable regularizer for denoising images using examples of clean images ◮ Geometric picture: Fit a convex set (with suitable facial structure ) to a set of points

This Talk – Pipeline ◮ Learn: Have access to examples of (relatively) clean example data. Use examples to learn a suitable regularizer. ◮ Apply: Faced with subsequent task that involves noisy or incomplete data. Apply learned regularizer.

Outline A paradigm for designing regularizers LP-representable regularizers SDP-representable regularizers Summary and future work

Designing Regularizers ◮ Conceptual question: Given a dataset, how do we identify a regularizer that is effective at enforcing structure that is present in the data? ◮ First Step: What properties of a regularizer make them effective?

Facial Geometry Key: Facial geometry of the level sets of the regularizer. ◮ Optimal solution corresponding to generic data often lie on low-dimensional faces ◮ In many applications the low-dimensional faces are the structured models we wish to recover e.g. images are sparse in wavelet domain Approach: Design a regularizer s.t. data lies on low-dimensional faces of level sets . We do so by using concise representations .

From Concise Representations to Regularizer Concise representations: We say that a datapoint (a vector) ② ∈ R d is concisely represented by a set { ❛ i } i ∈I ⊂ R d (called atoms ) if � ② = c i ❛ i , c i ≥ 0 , i ∈S , S⊂I for |S| small. Regularizer: � ① � = inf { t : ① ∈ t · conv ( { ❛ i } ) , t > 0 } . Smallest “blow-up” of conv ( { ❛ i } ) that includes ① [Maurey, Pisier, Jones...]

Sparse Representations ◮ Concisely represented data: Sparse vectors ◮ Linear sum of few standard basis vectors ◮ Regularizer: L1-norm ◮ Norm-ball is the convex hull of standard basis vectors [Donoho, Johnstone, Tibshirani, Chen, Saunders, Cand` es, Romberg, Tao, Tanner, Meinhausen, B¨ uhlmann]

Sparse Representations ◮ Concisely represented data: Low-rank matrices ◮ Linear sum of few rank-one unit-norm matrices ◮ Regularizer: Nuclear-norm (sum of singular values) ◮ Norm-ball is the convex hull of rank-one unit-norm matrices [Fazel, Boyd, Recht, Parrilo, Cand` es, Gross, ... ]

From Concise Representations to Regularizer ◮ From the view-point of optimization, this is the “correct” convex regularizer to employ ◮ Low-dimensional faces of conv ( { ❛ i } ) are concisely represented with { ❛ i } [Chandrasekaran, Recht, Parrilo, Willsky]

Designing Regularizers ◮ Conceptual question: Given a dataset, how do we identify a regularizer that is effective at enforcing structure present in the data? ◮ Prior work: If data can be concisely represented wrt a set { ❛ i } ⊂ R d then an effective regularizer is available ◮ It is the norm induced by conv ( { ❛ i } ). ◮ Approach: Given a dataset, identify a set { ❛ i } ⊂ R d s.t. data permits concise representations.

Polyhedral Regularizers Approach: Given dataset, how do we identify a set {± ❛ i } ⊂ R d such that the data permits concise representations? Assume: |{ ❛ i }| is finite. Precise mathematical formulation: i =1 ⊂ R d so that j =1 ⊂ R d , find { ❛ i } q Given data { ② ( j ) } n � x ( j ) where x ( j ) ② ( j ) ≈ ❛ i , are mostly zero i i ① ( j ) is sparse , A ① ( j ) = where A = [ ❛ 1 | . . . | ❛ q ] , and for each j .

Polyhedral Regularizers j =1 ⊂ R d , find A ∈ R q �→ R d so that Given data { ② ( j ) } n ① ( j ) is sparse ② ( j ) A ① ( j ) , ≈ where ∀ j . Regularizer: Natural choice of regularizer is the norm induced by conv ( {± ❛ i } ) , or equivalently A (L1 norm ball) , where A = [ ❛ 1 | . . . | ❛ q ] . The regularizer can be expressed as a linear program (LP) .

Polyhedral Regularizers – Dictionary Learning j =1 ⊂ R d , find A ∈ R q �→ R d so that Given data { ② ( j ) } n ① ( j ) is sparse ② ( j ) A ① ( j ) , ≈ where ∀ j . Studied elsewhere as: ◮ ‘ Dictionary Learning ’ or ‘ Sparse Coding ’ ◮ Olshausen, Field (’96); Aharon, Elad, Bruckstein (’06), Spielman, Wang, Wright (’12); Arora, Ge, Moitra (’13); Agarwal, Anandkumar, Netrapalli, Jain (’13); Barak, Kelner, Steurer (’14); ... ◮ Developed as a procedure for automatically discovering sparse representations with finite dictionaries

Learning an Infinite Set of Atoms? So far: ◮ Learning a regularizer corresponds to computing a matrix factorization ◮ Finite set of atoms = dictionary learning Question: Can we learn an infinite set of atoms? ◮ Richer family of concise representations ◮ Require ◮ Compact description of atoms ◮ Computationally tractable description of the convex hull Remainder of the talk: ◮ Specify infinite atomic set as a algebraic variety whose convex hull is computable via semidefinite programming

From dictionary learning to our work Dictionary learning Our work {± A ❡ ( i ) | ❡ ( i ) ∈ R p is a {A ( U ) | U ∈ R q × q , Atoms standard basis vector } U unit-norm rank-one } A : R p → R d A : R q × q → R d Compute Find A s.t. Find A s.t. ② ( j ) ≈ A ① ( j ) for ② ( j ) ≈ A ( X ( j ) ) for regularizer sparse ① ( j ) low-rank X ( j ) by Level set A ( L1-norm ball) A ( nuclear norm ball) Regularizer Linear Semidefinite expressed Programming (LP) Programming (SDP) via

Empirical results – Set-up ◮ Learn: Learn a collection of regularizers of varying complexities from 6500 example image patches. ◮ Apply: Denoise 720 new data points corrupted by additive Gaussian noise.

Empirical results – Comparison Denoise 720 new data points corrupted by addi- 0.71 tive Gaussian noise 0.7 Polyhedral regularizer, Normalized MSE 0.69 i.e. dictionary learning 0.68 Semidefinite- representable regularizer 0.67 0.66 Apply proximal denoising 0.65 (squared-loss + regularizer) 9 10 11 12 10 10 10 10 Computational cost of proximal operator Cost is derived by computing proximal operator via an interior point scheme

Semidefinite-Representable Regularizers Goal: Compute a matrix factorization problem j =1 ⊂ R d and a target dimension q , find A : Given data { ② ( j ) } n R q × q �→ R d so that X ( j ) ∈ R q × q , ② ( j ) A ( X ( j ) ) ≈ for low-rank for each j . Obstruction: This is a matrix factorization problem. The factors A and { X ( j ) } n j =1 are both unknown, and hence any factorization is not unique .

Identifiablity Issues j =1 ⊂ R d as ② ( j ) = A ( X ( j ) ) for ◮ Given a factorization of { ② ( j ) } n low-rank X ( j ) , there are many equivalent factorizations ◮ Let M : R q × q �→ R q × q be an invertible linear operator that preserves the rank of matrices ◮ Transpose operator M ( X ) = X ′ ◮ Conjugation by invertible matrices M ( X ) = PXQ ′ Then ② ( j ) = A ◦ M − 1 M ( X ( j ) ) ( ) � �� Linear map Low rank matrix specifies an equally valid factorization! ◮ {A ◦ M − 1 } specifies family of regularizers – require a canonical choice of factorization to uniquely specify a regularizer

Identifiablity Issues Theorem (Marcus and Moyls (’59)): An invertible linear operator M : R q × q �→ R q × q preserves the rank of matrices ⇔ composi- tion of ◮ Transpose operator M ( X ) = X ′ ◮ Conjugation by invertible matrices M ( X ) = PXQ ′ In our context, the regularizer is induced by A ◦ M − 1 (nuclear norm ball) ◮ M is transpose operator: leaves nuclear norm invariant ◮ M is conjugation by invertible matrices: apply polar decomposition to orthogonal + positive definite ◮ Orthogonal matrices also leave nuclear norm invariant ◮ Ambiguity down to conjugation by positive definite matrices

Identifiablity Issues Definition: A linear map A : R q × q �→ R d is normalized if d d � � A k A ′ A ′ k = k A k = I k =1 k =1 where A k ∈ R q × q is the k -th component linear functional of A . One should think of A as   �A 1 , X � .   . A ( X ) = .   �A d , X �

Fitting Convex Sets to Data via Matrix Factorization Yong Sheng Soh - PowerPoint PPT Presentation

Fitting Convex Sets to Data via Matrix Factorization Yong Sheng Soh LCCC Focus Period May/June 2017 a California Institute of Technology Joint work with Venkat Chandrasekaran Variational Approach to Inference Given data , fit model ( )

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS675: Convex and Combinatorial Optimization Fall 2019 Geometric Duality of Convex Sets and

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

2. Convex sets affine and convex sets some important examples operations that preserve

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

14. Convex programming Convex sets and functions Convex programs Hierarchy of

Learning Theory Bridges Loss Functions Sep 10 th , 2020 Han Bao (The University of Tokyo / RIKEN

Why Convex Optimization Need to Consider . . . Is Ubiquitous and Why Main Result Decision

Feature Selection & the Shapley-Folkman Theorem. Alexandre dAspremont , CNRS & D.I.,

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

Some Remarks on Sets of Lexicographic Probabilities and Sets of Desirable Gambles Fabio G. Cozman

Convex Methods for Dense Semantic 3D Reconstruction Christian H ane Computer Vision and

Precautionary Energy Storage Summary Introduction Prudence Frugality Production risk Tun c

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

Sambuz

Useful Links

Newsletter

Mail Us

Fitting Convex Sets to Data via Matrix Factorization Yong Sheng Soh - PowerPoint PPT Presentation

Fitting Convex Sets to Data via Matrix Factorization Yong Sheng Soh LCCC Focus Period May/June 2017 a California Institute of Technology Joint work with Venkat Chandrasekaran Variational Approach to Inference Given data , fit model ( )

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

CS675: Convex and Combinatorial Optimization Fall 2019 Geometric Duality of Convex Sets and

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

2. Convex sets affine and convex sets some important examples operations that preserve

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

14. Convex programming Convex sets and functions Convex programs Hierarchy of

Learning Theory Bridges Loss Functions Sep 10 th , 2020 Han Bao (The University of Tokyo / RIKEN

Why Convex Optimization Need to Consider . . . Is Ubiquitous and Why Main Result Decision

Feature Selection &amp; the Shapley-Folkman Theorem. Alexandre dAspremont , CNRS &amp; D.I.,

Multiagent Systems: Spring 2006 Ulle Endriss Institute for Logic, Language and Computation

Some Remarks on Sets of Lexicographic Probabilities and Sets of Desirable Gambles Fabio G. Cozman

Convex Methods for Dense Semantic 3D Reconstruction Christian H ane Computer Vision and

Precautionary Energy Storage Summary Introduction Prudence Frugality Production risk Tun c

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

Sambuz

Useful Links

Newsletter

Mail Us

Feature Selection & the Shapley-Folkman Theorem. Alexandre dAspremont , CNRS & D.I.,