Learning sparsely used overcomplete dictionaries Alekh Agarwal - PowerPoint PPT Presentation

Learning sparsely used overcomplete dictionaries Alekh Agarwal Microsoft Research Joint work with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation I: Feature learning Practice Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation I: Feature learning Practice Papers Features 1.2 0.8 0 1.5 0 3.5 2 0.1 0.7 0 0.3 0.8 Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation I: Feature learning Practice Papers Features 1.2 0.8 0 1.5 0 3.5 2 0.1 Feature 0.7 0 0.3 0.8 eng. Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation I: Feature learning Practice Papers Features 1.2 0.8 0 1.5 0 3.5 2 0.1 Feature 0.7 0 0.3 0.8 eng. Feature engineering takes considerable time and skill Typically critical to good performance Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation I: Feature learning Practice Papers Features 1.2 0.8 0 1.5 0 3.5 2 0.1 Feature 0.7 0 0.3 0.8 eng. Feature engineering takes considerable time and skill Typically critical to good performance Can we learn good features from data? Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation II: Signal compression Expensive to store high-dimensional signals Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation II: Signal compression Expensive to store high-dimensional signals Sparse signals have compact representation Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Motivation II: Signal compression Expensive to store high-dimensional signals Sparse signals have compact representation Can we learn a representation where signals of interest are sparse? Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Dictionary learning in practice Image compression (Bruckstein et al., 2009) Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Dictionary learning in practice Image compression (Bruckstein et al., 2009) Similar successes in image denoising, inpainting, superresolution, . . . Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Dictionary learning in practice Image compression (Bruckstein et al., 2009) Similar successes in image denoising, inpainting, superresolution, . . . Non-convex optimization, limited theoretical understanding Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Dictionary learning setup Goal Find a dictionary with r elements such that each data point is a combination of only s dictionary elements. Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Dictionary learning setup Goal Find a dictionary with r elements such that each data point is a combination of only s dictionary elements. Y A ∗ X ∗ = Examples Dictionary Coefficients Encode faces using dictionary rather than pixel values Sparsity for compression, signal processing . . . Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Dictionary learning setup Goal Find a dictionary with r elements such that each data point is a combination of only s dictionary elements. A ∗ X ∗ Y d d = r n r n Examples Dictionary Coefficients Topic models, overlapping clustering, image representation Overcomplete setting , r ≫ d relevant in practice Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Alternating minimization Objective min A , X � X � 1 Y = AX subject to � �� i , j | X ij | Dominant approach in practice Start with initial dictionary A (0) Sparse regression for coefficients given dictionary X ( t + 1) i = arg min x ∈ R r � x � 1 s.t. � Y i − A ( t ) x � 2 ≤ ǫ t Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Alternating minimization Objective min A , X � X � 1 Y = AX subject to � �� i , j | X ij | Dominant approach in practice Start with initial dictionary A (0) Sparse regression for coefficients given dictionary Least squares for dictionary given coefficients A ( t + 1) = YX ( t + 1) + i.e. Y ≈ A ( t + 1) X ( t + 1) Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Alternating minimization Objective min A , X � X � 1 Y = AX subject to � �� i , j | X ij | Dominant approach in practice Start with initial dictionary A (0) Sparse regression for coefficients given dictionary Least squares for dictionary given coefficients A ( t + 1) = YX ( t + 1) + i.e. Y ≈ A ( t + 1) X ( t + 1) Similar to EM for this problem Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Alternating minimization Objective min A , X � X � 1 Y = AX subject to � �� i , j | X ij | Dominant approach in practice Start with initial dictionary A (0) Sparse regression for coefficients given dictionary Least squares for dictionary given coefficients A ( t + 1) = YX ( t + 1) + i.e. Y ≈ A ( t + 1) X ( t + 1) Similar to EM for this problem Does not converge to global optimum from arbitrary A (0) Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Alternating minimization goal ( � A , � X ) = min A , X � X � 1 subject to Y = AX Y = AX is a non-convex constraint Average of solutions is not a solution! Y = AX , Y = ( − A )( − X ) , Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Alternating minimization goal ( � A , � X ) = min A , X � X � 1 subject to Y = AX Y = AX is a non-convex constraint Average of solutions is not a solution! � A + ( − A ) � � X + ( − X ) � Y = AX , Y = ( − A )( − X ) , Y � = 2 2 Non-convex optimization, NP-hard in general Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Previous theory work Exact recovery in undercomplete setting by Spielman et al. via linear programming Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Previous theory work Exact recovery in undercomplete setting by Spielman et al. via linear programming We combine alternating minimization with a novel initialization Global optimum despite non-convexity in overcomplete setting Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Initialization: Key ideas Find several samples with a common dictionary element Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Initialization: Key ideas 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Find several samples with a common dictionary element Top singular vector of these samples is an estimate of this element Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Correlation graph Definition (Correlation graph) One node for each example Large correlation ⇒ common dictionary element Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Correlation graph Definition (Correlation graph) One node for each example Edge { Y i , Y j } if |� Y i , Y j �| ≥ ρ Large correlation ⇒ common dictionary element Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Correlation graph S 1 S 2 Definition (Correlation graph) One node for each example Edge { Y i , Y j } if Good |� Y i , Y j �| ≥ ρ Bad Large correlation ⇒ common dictionary element Samples in a clique contain a common dictionary element Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Correlation graph S 1 S 2 Definition (Correlation graph) One node for each example Edge { Y i , Y j } if Good |� Y i , Y j �| ≥ ρ Bad Large correlation ⇒ common dictionary element Samples in a clique contain a common dictionary element Easy to construct cliques Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Initialization algorithm 1. Construct correlation graph G ρ given a threshold ρ 2. For each edge ( Y i , Y j ) in G ρ Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary Learning

Learning sparsely used overcomplete dictionaries Alekh Agarwal - PowerPoint PPT Presentation

Learning sparsely used overcomplete dictionaries Alekh Agarwal Microsoft Research Joint work with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary

Overcomplete models & Lateral interactions and Feedback Teppo Niinimki April 22, 2010

61A Lecture 13 {'Dem': 0} Wednesday, September 28 2 Limitations on Dictionaries Implementing

Computational Dictionaries Computational Dictionaries & Terminology & Terminology

Py Python Dictionaries Python dictionaries are the only built-in mapping type: unordered

Dictionaries A Key-Value Relationship C-START Python PD Workshop C-START Python PD Workshop

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Sparse Overcomplete, Shift- and Transform-Invariant Representations Class 15. 14 Oct 2009

HTTP/2 Compression Dictionaries Vlad Krasnov In a nutshell Allow cross-stream compression in

Dictionaries Dictionaries and and the the Organization Organization of of Knowledge

Lecture 22: Applications of Dictionaries; Plotting with Matplotlib Practice with Dictionaries

STATS 507 Data Analysis in Python Lecture 4: Dictionaries and Tuples Two more fundamental

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

DLMF Content Dictionaries Special Function Catalog The Next Iteration DLMF Content

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Median Filter (5) sparsely populated 5x5 median filters effect on white noise; std of output

Advanced #3: Dictionaries, Trees, and Graphs SAMS SENIOR CS TRACK Learning Goals Use

Dictionary learning of sound speed profiles Michael Bianco a) and Peter Gerstoft Scripps

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous

CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science

From monolithic XML for print/web to lean XML for data: realising linked data for dictionaries

D ISTRIBUTED S YSTEMS [COMP9243] Lecture 9: Security T HE C AST Slide 1 Slide 3 Introduction

CSE 510 Web Data Engineering Data Access Object (DAO) Java Design Pattern UB CSE 510 Web Data

Important Information This session will be recorded. At the top of the screen, microphone

Learning sparsely used overcomplete dictionaries Alekh Agarwal - PowerPoint PPT Presentation

Learning sparsely used overcomplete dictionaries Alekh Agarwal Microsoft Research Joint work with Anima Anandkumar, Prateek Jain, Praneeth Netrapalli and Rashish Tandon Agarwal, Anandkumar, Jain, Netrapalli, Tandon Overcomplete Dictionary

Overcomplete models &amp; Lateral interactions and Feedback Teppo Niinimki April 22, 2010

61A Lecture 13 {'Dem': 0} Wednesday, September 28 2 Limitations on Dictionaries Implementing

Computational Dictionaries Computational Dictionaries &amp; Terminology &amp; Terminology

Py Python Dictionaries Python dictionaries are the only built-in mapping type: unordered

Dictionaries A Key-Value Relationship C-START Python PD Workshop C-START Python PD Workshop

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Sparse Overcomplete, Shift- and Transform-Invariant Representations Class 15. 14 Oct 2009

HTTP/2 Compression Dictionaries Vlad Krasnov In a nutshell Allow cross-stream compression in

Dictionaries Dictionaries and and the the Organization Organization of of Knowledge

Lecture 22: Applications of Dictionaries; Plotting with Matplotlib Practice with Dictionaries

STATS 507 Data Analysis in Python Lecture 4: Dictionaries and Tuples Two more fundamental

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

DLMF Content Dictionaries Special Function Catalog The Next Iteration DLMF Content

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Median Filter (5) sparsely populated 5x5 median filters effect on white noise; std of output

Advanced #3: Dictionaries, Trees, and Graphs SAMS SENIOR CS TRACK Learning Goals Use

Dictionary learning of sound speed profiles Michael Bianco a) and Peter Gerstoft Scripps

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous

CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science

From monolithic XML for print/web to lean XML for data: realising linked data for dictionaries

D ISTRIBUTED S YSTEMS [COMP9243] Lecture 9: Security T HE C AST Slide 1 Slide 3 Introduction

CSE 510 Web Data Engineering Data Access Object (DAO) Java Design Pattern UB CSE 510 Web Data

Important Information This session will be recorded. At the top of the screen, microphone

Overcomplete models & Lateral interactions and Feedback Teppo Niinimki April 22, 2010

Computational Dictionaries Computational Dictionaries & Terminology & Terminology