CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite - PowerPoint PPT Presentation

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite Programming Instructor: Shaddin Dughmi

Outline Basics of PSD Matrices 1 Semidefinite Programming 2 Max Cut 3

Symmetric Matrices A matrix A ∈ R n × n is symmetric if and only if it is square and A ij = A ji for all i, j . We denote the cone of n × n symmetric matrices by S n . Basics of PSD Matrices 1/13

Symmetric Matrices A matrix A ∈ R n × n is symmetric if and only if it is square and A ij = A ji for all i, j . We denote the cone of n × n symmetric matrices by S n . Fact A matrix A ∈ R n × n is symmetric if and only if it is orthogonally diagonalizable. Basics of PSD Matrices 1/13

Symmetric Matrices A matrix A ∈ R n × n is symmetric if and only if it is square and A ij = A ji for all i, j . We denote the cone of n × n symmetric matrices by S n . Fact A matrix A ∈ R n × n is symmetric if and only if it is orthogonally diagonalizable. i.e. A = QDQ ⊺ where Q is an orthogonal matrix and D = diag ( λ 1 , . . . , λ n ) . The columns of Q are the (normalized) eigenvectors of A , with corresponding eigenvalues λ 1 , . . . , λ n Equivalently: As a linear operator, A scales the space along an orthonormal basis Q The scaling factor λ i along direction q i may be negative, positive, or 0 . Basics of PSD Matrices 1/13

Positive Semi-Definite Matrices A matrix A ∈ R n × n is positive semi-definite if it is symmetric and moreover all its eigenvalues are nonnegative. We denote the cone of n × n positive semi-definite matrices by S n + We use A � 0 as shorthand for A ∈ S n + Basics of PSD Matrices 2/13

Positive Semi-Definite Matrices A matrix A ∈ R n × n is positive semi-definite if it is symmetric and moreover all its eigenvalues are nonnegative. We denote the cone of n × n positive semi-definite matrices by S n + We use A � 0 as shorthand for A ∈ S n + A = QDQ ⊺ where Q is an orthogonal matrix and D = diag ( λ 1 , . . . , λ n ) , where λ i ≥ 0 . As a linear operator, A performs nonnegative scaling along an orthonormal basis Q Basics of PSD Matrices 2/13

Positive Semi-Definite Matrices A matrix A ∈ R n × n is positive semi-definite if it is symmetric and moreover all its eigenvalues are nonnegative. We denote the cone of n × n positive semi-definite matrices by S n + We use A � 0 as shorthand for A ∈ S n + A = QDQ ⊺ where Q is an orthogonal matrix and D = diag ( λ 1 , . . . , λ n ) , where λ i ≥ 0 . As a linear operator, A performs nonnegative scaling along an orthonormal basis Q Note Positive definite, negative semi-definite, and negative definite defined similarly. Basics of PSD Matrices 2/13

Geometric Intuition for PSD Matrices For A � 0 , let q 1 , . . . , q n be the orthonormal eigenbasis for A , and let λ 1 , . . . , λ n ≥ 0 be the corresponding eigenvalues. The linear operator x → Ax scales the q i component of x by λ i When applied to every x in the unit ball, the image of A is an ellipsoid centered at the origin with principal directions q 1 , . . . , q n and corresponding diameters 2 λ 1 , . . . , 2 λ n When A is positive definite ( i.e.λ i > 0 ), and therefore invertible, the � � y : y T ( AA T ) − 1 y ≤ 1 ellipsoid is the set Basics of PSD Matrices 3/13

Useful Properties of PSD Matrices If A � 0 , then x T Ax ≥ 0 for all x 1 A has a positive semi-definite square root A 2 2 = Q diag ( √ λ 1 , . . . , √ λ n ) Q ⊺ 1 A A = B T B for some matrix B . Interpretation: PSD matrices encode the “pairwise similarity” relationships of a family of vectors. A ij is dot product of the i th and j th columns of B . Interpretation: The quadratic form x T Ax is the length of a linear transformation of x , namely || Bx || 2 2 The quadratic function x T Ax is convex A can be expressed as a sum of vector outer-products e.g., A = � n v i = √ λ i � i =1 v i v T i for � q i Basics of PSD Matrices 4/13

Useful Properties of PSD Matrices If A � 0 , then x T Ax ≥ 0 for all x 1 A has a positive semi-definite square root A 2 2 = Q diag ( √ λ 1 , . . . , √ λ n ) Q ⊺ 1 A A = B T B for some matrix B . Interpretation: PSD matrices encode the “pairwise similarity” relationships of a family of vectors. A ij is dot product of the i th and j th columns of B . Interpretation: The quadratic form x T Ax is the length of a linear transformation of x , namely || Bx || 2 2 The quadratic function x T Ax is convex A can be expressed as a sum of vector outer-products e.g., A = � n v i = √ λ i � i =1 v i v T i for � q i As it turns out, each of the above is also sufficient for A � 0 (assuming A is symmetric). Basics of PSD Matrices 4/13

Properties of PSD Matrices Relevant for Computation The set of PSD matrices is convex Follows from the characterization: x T Ax ≥ 0 for all x The set of PSD matrices admits an efficient separation oracle Given A , find eigenvector v with negative eigenvalue: v T Av < 0 . A PSD matrix A ∈ R n × n implicitly encodes the “pairwise similarities” of a family of vectors b 1 , . . . , b n ∈ R n . Follows from the characterization A = B T B for some B A ij = � b i , b j � Can convert between A and B efficiently. B to A : Matrix multiplication A to B : B can be expressed in terms of eigenvectors/eigenvalues of A , which can be easily computed to arbitrary precision via powering methods. Alternatively: Cholesky decomposition, SVD, . . . . Basics of PSD Matrices 5/13

Convex Optimization Convex Set min (or max) f ( x ) subject to x ∈ X Convex Optimization Problem Generalization of LP where Feasible set X convex: αx + (1 − α ) y ∈ X , for all x, y ∈ X and α ∈ [0 , 1] Objective function f is convex in case of minimization f ( αx + (1 − α ) y ) ≤ αf ( x ) + (1 − α ) f ( y ) for all x, y ∈ X and α ∈ [0 , 1] Objective function f is concave in case of maximization Semidefinite Programming 6/13

Convex Optimization Convex Set min (or max) f ( x ) subject to x ∈ X Convex Optimization Problems Solvable efficiently (i.e. in polynomial time) to arbitrary precision under mild conditions Separation oracle for X First-order oracle for evaluating f ( x ) and ▽ f ( x ) . For more detail Take CSCI 675! Semidefinite Programming 6/13

Semidefinite Programs These are Optimization problems where the feasible set is the cone of PSD cone, possibly intersected with linear constraints. Generalization of LP . Special case of Convex Optimization. maximize c ⊺ x subject to Ax � b x 1 F 1 + x 2 F 2 . . . x n F n + G is PSD F 1 , . . . , F n , G , and A are given matrices, and c, b are given vectors. Semidefinite Programming 7/13

Semidefinite Programs These are Optimization problems where the feasible set is the cone of PSD cone, possibly intersected with linear constraints. Generalization of LP . Special case of Convex Optimization. maximize c ⊺ x subject to Ax � b x 1 F 1 + x 2 F 2 . . . x n F n + G is PSD F 1 , . . . , F n , G , and A are given matrices, and c, b are given vectors. Examples Fitting a distribution, say a Gaussian, to observed data. Variable is a positive semi-definite covariance matrix. As a relaxation to combinatorial problems that encode pairwise relationships: e.g. finding the maximum cut of a graph. Semidefinite Programming 7/13

Semidefinite Programs These are Optimization problems where the feasible set is the cone of PSD cone, possibly intersected with linear constraints. Generalization of LP . Special case of Convex Optimization. maximize c ⊺ x subject to Ax � b x 1 F 1 + x 2 F 2 . . . x n F n + G is PSD F 1 , . . . , F n , G , and A are given matrices, and c, b are given vectors. Fact SDP can be solved in polytime to arbitrary precision, since PSD constraints admit a polytime separation oracle. Semidefinite Programming 7/13

The Max Cut Problem Given an undirected graph G = ( V, E ) , find a partition of V into ( S, V \ S ) maximizing number of edges with exactly one end in S . � 1 − x i x j maximize ( i,j ) ∈ E 2 subject to x i ∈ {− 1 , 1 } , for i ∈ V. Max Cut 8/13

The Max Cut Problem Given an undirected graph G = ( V, E ) , find a partition of V into ( S, V \ S ) maximizing number of edges with exactly one end in S . � 1 − x i x j maximize ( i,j ) ∈ E 2 subject to x i ∈ {− 1 , 1 } , for i ∈ V. Instead of requiring x i to be on the 1 dimensional sphere, we relax and permit it to be in the n -dimensional sphere, where n = | V | . Vector Program relaxation � 1 − � v i · � v j maximize ( i,j ) ∈ E 2 subject to || � v i || 2 = 1 , for i ∈ V. v i ∈ R n , � for i ∈ V. Max Cut 8/13

SDP Relaxation Recall: A symmetric n × n matrix Y is PSD iff Y = V T V for n × n matrix V Equivalently: PSD matrices encode pairwise dot products of columns of V When diagonal entries of Y are 1 , V has unit length columns Recall: Y and V can be recovered from each other efficiently Max Cut 9/13

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite - PowerPoint PPT Presentation

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite Programming Instructor: Shaddin Dughmi Outline Basics of PSD Matrices 1 Semidefinite Programming 2 Max Cut 3 Symmetric Matrices A matrix A R n n is symmetric if and

CS672: Approximation Algorithms Spring 2020 Linear Programming Review Instructor: Shaddin Dughmi

CS672: Approximation ALgorithms Spring 2017 Crash Course in Linear Programming Instructor:

CS672: Approximation Algorithms Spring 14 Introduction to Linear Programming II Instructor:

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

6. Approximation and fitting norm approximation least-norm problems regularized

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

Advanced Algorithms COMS31900 Approximation algorithms part four Asymptotic Polynomial Time

Advanced Algorithms COMS31900 Approximation algorithms part two more constant factor

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Approximation Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Approximation Algorithms

Lecture: Approximation Algorithms Lecture: Approximation Algorithms Jannik Matuschke November 5,

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

DM865 (10 ECTS) Heuristikker og Approximationsalgoritmer [Heuristics and Approximation

Approximation Algorithms for Geometric Proximity Problems: Introduction Background Approximation

CSE 255 Lecture 4 Data Mining and Predictive Analytics Graphical Models 4. Network modularity

Signed tropical convexity Georg Loho joint work with L aszl o V egh London School of

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard CarnegieMellonUniversity

Real-valued average consensus over noisy quantized channels Andrea Censi Richard Murray Control

Singularly Perturbed Algorithms for Dynamic Average Consensus Solmaz S. Kia, Jorge Corts, Sonia

Lvy-Khintchine random matrices Paul Jung University of Alabama Birmingham September 21, 2014

lgebra Linear e Aplicaes MATRIX ALGEBRA Basic definitions A scalar is complex number

On the Consistency of Ranking Algorithms John Duchi Lester Mackey Michael I. Jordan University