Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation - PowerPoint PPT Presentation

ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1

Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges Page 2

Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges • Maximum A Posteriori (MAP) Inference � Find the mode with the lowest energy / potential Page 2

A Large Number of Applications ... • Computer Vision Applications � Image Segmentation � Geometric Surface Labeling � Photo Montage � Scene Decomposition � Object Detection � Color Segmentation � ... • Protein Folding • Metric Labeling • Error-Correcting Codes • ... OpenGM Benchmark Page 3

Problem Setup • Model � n vertices ( x 1 , · · · , x n ) � m di ff erent states ( ) x i 2 { 1 , · · · , m } • Goal: n X X maximize f ( x 1 , · · · , x n ) := w i ( x i ) + W ij ( x i , x j ) | {z } i =1 ( i,j ) 2 G negative energy function s.t. x i 2 { 1 , · · · , m } Page 4

Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } x i Page 5

Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij Page 5

Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij • Equivalent Integer Program: n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } � Non-Convex! Page 5

Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } Page 6

Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 · · · X 11 X 12 X 1 n X > · · · X 22 X 2 n 6 7 12 • Auxiliary Variable X = 6 7 . . . . . . . . · · · . 4 5 X > X > · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X ij = x i x > X ii = x i x > s.t. i = diag( x i ) j , x i 2 { e 1 , · · · , e m } Page 6

Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 2 3 · · · X 11 X 12 X 1 n x 1 X > · · · X 22 X 2 n x 2 6 7 6 7 12 • Auxiliary Variables X = 5 and x = 6 7 6 7 . . . . . . . . . . · · · . . 4 4 5 X > X > x n · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 7

Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8

Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } • Semidefinite Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 x X X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8

Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } Page 9

Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } • Relax the Constraints x i 2 { e 1 , · · · , e m } n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 9

Our Semidefinite Formulation • Final Semidefinite Program (SDR): n X X maximize h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G • Low-Rank and Sparse! • O ( nm 2 ) linear equality constraints x · x > = X Page 10

Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR)  � x > 1 X ij 1 = x i ( 1  i , j  n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1  i, j  n ) | {z } Θ ( n 2 m ) constraints Page 11

Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR)  � x > 1 X ij 1 = x i ( 1  i , j  n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1  i, j  n ) | {z } Θ ( n 2 m ) constraints • Answer: No! Proposition Any feasible solution to SDR necessarily satisfies X ij 1 = x i . O ( nm 2 ) v.s. O ( n 2 m + nm 2 ) linear equality constraints! Page 11

ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , x X X ii = diag( x i ) 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 12

ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � Generic Formulation x > 1 s.t. ⌫ 0 , x X max h C , X i X ii = diag( x i ) s.t. A ( X ) = b , 1 > x i = 1 , x i � 0 , B ( X ) � 0 , X ij � 0 , 8 ( i, j ) 2 G X ⌫ 0 . • A , B , C are all highly sparse! Page 12

Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... Page 13

Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Eigen-decomposition of dense matrices is expensive! Page 13

Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! Page 14

Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse Page 14

Accelerated ADMM (SDPAD-LR) • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse • Numerically fast � e.g. Lanczos Process O ( nmr 2 + m 2 |G| ) • Empirically, r ⇡ 8 Cornelius Lanczos Page 15

Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT Page 16

Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT categories graphs # instances avg time n m PIC-Object full 60 11-21 37 5m32s PIC-Folding mixed 2K 2-503 21 21m42s PIC-Align dense 30-400 20-93 19 37m63s GM-Label sparse 1K 7 324 6m32s GM-Char sparse 5K-18K 2 100 1h13m GM-Montage grid 100K 5,7 3 9h32m GM-Matching dense 19 19 4 2m21s ORIENT sparse 1K 16 10 10m21s All problems can be solved within reasonable time! Page 16

Empirical Convergence: Example • Benchmark: Geometric Surface Labeling (gm275) � matrix size: 5201; # constraints: 218791 � Stopping criterion: duality gap < 10 � 3 Page 17

Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation - PowerPoint PPT Presentation

ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1 Maximum A Posteriori (MAP) Inference Markov Random Field (MRF) w i :

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

QoS-Guaranteed User Association in Sokun, Gohary, HetNets via Semidefinite Relaxation

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori (MAP) Aykut Erdem

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

CS480/680 Machine Learning Lecture 6: January 23 st , 2020 Maximum A posteriori & Maximum

Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming,

Semidefinite Programming Pekka Orponen T-79.7001 Postgraduate Course on Theoretical Computer

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

Identifying Undirected Network Structure via Semidefinite Relaxation Rasoul Shafipour, Santiago

Lecture 7: Arora Rao Vazirani Lecture Outline Part I: Semidefinite Programming Relaxation for

A Semidefinite Relaxation Scheme for Multivariate Quartic Polynomial Optimization With Quadratic

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

MILS Research Montage MILS Research Montage LAW LAW Work-In-Progress Session Work-In-Progress

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

Exploring the Design Space for Adaptive Graphical User Interfaces Krzysztof Gajos (University of

Parallel and Concurrent Haskell Part I Asynchronous agents Simon Marlow Threads (Microsoft

dV/dt Accelerating the Rate of Progress towards Extreme Scale Collaborative Science Miron Livny

People in Action Learning Objective: To be able to create a montage to portray movement. NEXT

Alefiya Hussain hussain@isi.edu Testbed facility:

Application of Data Mining for Prospective Assembly Time Determination Dortmund / 05.09.2017, Dr.

Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation - PowerPoint PPT Presentation

ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1 Maximum A Posteriori (MAP) Inference Markov Random Field (MRF) w i :

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

QoS-Guaranteed User Association in Sokun, Gohary, HetNets via Semidefinite Relaxation

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Lecture 7: Maximum Likelihood Estimation (MLE) Maximum a Posteriori (MAP) Aykut Erdem

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

CS480/680 Machine Learning Lecture 6: January 23 st , 2020 Maximum A posteriori &amp; Maximum

Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming,

Semidefinite Programming Pekka Orponen T-79.7001 Postgraduate Course on Theoretical Computer

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

Identifying Undirected Network Structure via Semidefinite Relaxation Rasoul Shafipour, Santiago

Lecture 7: Arora Rao Vazirani Lecture Outline Part I: Semidefinite Programming Relaxation for

A Semidefinite Relaxation Scheme for Multivariate Quartic Polynomial Optimization With Quadratic

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

MILS Research Montage MILS Research Montage LAW LAW Work-In-Progress Session Work-In-Progress

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

Exploring the Design Space for Adaptive Graphical User Interfaces Krzysztof Gajos (University of

Parallel and Concurrent Haskell Part I Asynchronous agents Simon Marlow Threads (Microsoft

dV/dt Accelerating the Rate of Progress towards Extreme Scale Collaborative Science Miron Livny

People in Action Learning Objective: To be able to create a montage to portray movement. NEXT

Alefiya Hussain hussain@isi.edu Testbed facility:

Application of Data Mining for Prospective Assembly Time Determination Dortmund / 05.09.2017, Dr.

CS480/680 Machine Learning Lecture 6: January 23 st , 2020 Maximum A posteriori & Maximum