CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite - - PowerPoint PPT Presentation

cs672 approximation algorithms spring 2020 intro to
SMART_READER_LITE
LIVE PREVIEW

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite - - PowerPoint PPT Presentation

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite Programming Instructor: Shaddin Dughmi Outline Basics of PSD Matrices 1 Semidefinite Programming 2 Max Cut 3 Symmetric Matrices A matrix A R n n is symmetric if and


slide-1
SLIDE 1

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite Programming

Instructor: Shaddin Dughmi

slide-2
SLIDE 2

Outline

1

Basics of PSD Matrices

2

Semidefinite Programming

3

Max Cut

slide-3
SLIDE 3

Symmetric Matrices

A matrix A ∈ Rn×n is symmetric if and only if it is square and Aij = Aji for all i, j. We denote the cone of n × n symmetric matrices by Sn.

Basics of PSD Matrices 1/13

slide-4
SLIDE 4

Symmetric Matrices

A matrix A ∈ Rn×n is symmetric if and only if it is square and Aij = Aji for all i, j. We denote the cone of n × n symmetric matrices by Sn.

Fact

A matrix A ∈ Rn×n is symmetric if and only if it is orthogonally diagonalizable.

Basics of PSD Matrices 1/13

slide-5
SLIDE 5

Symmetric Matrices

A matrix A ∈ Rn×n is symmetric if and only if it is square and Aij = Aji for all i, j. We denote the cone of n × n symmetric matrices by Sn.

Fact

A matrix A ∈ Rn×n is symmetric if and only if it is orthogonally diagonalizable. i.e. A = QDQ⊺ where Q is an orthogonal matrix and D = diag(λ1, . . . , λn). The columns of Q are the (normalized) eigenvectors of A, with corresponding eigenvalues λ1, . . . , λn Equivalently: As a linear operator, A scales the space along an

  • rthonormal basis Q

The scaling factor λi along direction qi may be negative, positive,

  • r 0.

Basics of PSD Matrices 1/13

slide-6
SLIDE 6

Positive Semi-Definite Matrices

A matrix A ∈ Rn×n is positive semi-definite if it is symmetric and moreover all its eigenvalues are nonnegative. We denote the cone of n × n positive semi-definite matrices by Sn

+

We use A 0 as shorthand for A ∈ Sn

+

Basics of PSD Matrices 2/13

slide-7
SLIDE 7

Positive Semi-Definite Matrices

A matrix A ∈ Rn×n is positive semi-definite if it is symmetric and moreover all its eigenvalues are nonnegative. We denote the cone of n × n positive semi-definite matrices by Sn

+

We use A 0 as shorthand for A ∈ Sn

+

A = QDQ⊺ where Q is an orthogonal matrix and D = diag(λ1, . . . , λn), where λi ≥ 0. As a linear operator, A performs nonnegative scaling along an

  • rthonormal basis Q

Basics of PSD Matrices 2/13

slide-8
SLIDE 8

Positive Semi-Definite Matrices

A matrix A ∈ Rn×n is positive semi-definite if it is symmetric and moreover all its eigenvalues are nonnegative. We denote the cone of n × n positive semi-definite matrices by Sn

+

We use A 0 as shorthand for A ∈ Sn

+

A = QDQ⊺ where Q is an orthogonal matrix and D = diag(λ1, . . . , λn), where λi ≥ 0. As a linear operator, A performs nonnegative scaling along an

  • rthonormal basis Q

Note

Positive definite, negative semi-definite, and negative definite defined similarly.

Basics of PSD Matrices 2/13

slide-9
SLIDE 9

Geometric Intuition for PSD Matrices

For A 0, let q1, . . . , qn be the orthonormal eigenbasis for A, and let λ1, . . . , λn ≥ 0 be the corresponding eigenvalues. The linear operator x → Ax scales the qi component of x by λi When applied to every x in the unit ball, the image of A is an ellipsoid centered at the origin with principal directions q1, . . . , qn and corresponding diameters 2λ1, . . . , 2λn

When A is positive definite (i.e.λi > 0), and therefore invertible, the ellipsoid is the set

  • y : yT (AAT )−1y ≤ 1
  • Basics of PSD Matrices

3/13

slide-10
SLIDE 10

Useful Properties of PSD Matrices

If A 0, then xT Ax ≥ 0 for all x A has a positive semi-definite square root A

1 2

A

1 2 = Q diag(√λ1, . . . , √λn)Q⊺

A = BT B for some matrix B.

Interpretation: PSD matrices encode the “pairwise similarity” relationships of a family of vectors. Aij is dot product of the ith and jth columns of B. Interpretation: The quadratic form xT Ax is the length of a linear transformation of x, namely ||Bx||2

2

The quadratic function xT Ax is convex A can be expressed as a sum of vector outer-products

e.g., A = n

i=1 vivT i for

vi = √λi qi

Basics of PSD Matrices 4/13

slide-11
SLIDE 11

Useful Properties of PSD Matrices

If A 0, then xT Ax ≥ 0 for all x A has a positive semi-definite square root A

1 2

A

1 2 = Q diag(√λ1, . . . , √λn)Q⊺

A = BT B for some matrix B.

Interpretation: PSD matrices encode the “pairwise similarity” relationships of a family of vectors. Aij is dot product of the ith and jth columns of B. Interpretation: The quadratic form xT Ax is the length of a linear transformation of x, namely ||Bx||2

2

The quadratic function xT Ax is convex A can be expressed as a sum of vector outer-products

e.g., A = n

i=1 vivT i for

vi = √λi qi

As it turns out, each of the above is also sufficient for A 0 (assuming A is symmetric).

Basics of PSD Matrices 4/13

slide-12
SLIDE 12

Properties of PSD Matrices Relevant for Computation

The set of PSD matrices is convex

Follows from the characterization: xT Ax ≥ 0 for all x

The set of PSD matrices admits an efficient separation oracle

Given A , find eigenvector v with negative eigenvalue: vT Av < 0.

A PSD matrix A ∈ Rn×n implicitly encodes the “pairwise similarities” of a family of vectors b1, . . . , bn ∈ Rn.

Follows from the characterization A = BT B for some B Aij = bi, bj

Can convert between A and B efficiently.

B to A: Matrix multiplication A to B: B can be expressed in terms of eigenvectors/eigenvalues of A, which can be easily computed to arbitrary precision via powering

  • methods. Alternatively: Cholesky decomposition, SVD, . . . .

Basics of PSD Matrices 5/13

slide-13
SLIDE 13

Outline

1

Basics of PSD Matrices

2

Semidefinite Programming

3

Max Cut

slide-14
SLIDE 14

Convex Optimization

min (or max) f(x) subject to x ∈ X

Convex Set

Convex Optimization Problem

Generalization of LP where Feasible set X convex: αx + (1 − α)y ∈ X, for all x, y ∈ X and α ∈ [0, 1] Objective function f is convex in case of minimization

f(αx + (1 − α)y) ≤ αf(x) + (1 − α)f(y) for all x, y ∈ X and α ∈ [0, 1]

Objective function f is concave in case of maximization

Semidefinite Programming 6/13

slide-15
SLIDE 15

Convex Optimization

min (or max) f(x) subject to x ∈ X

Convex Set

Convex Optimization Problems Solvable efficiently (i.e. in polynomial time) to arbitrary precision under mild conditions Separation oracle for X First-order oracle for evaluating f(x) and ▽f(x).

For more detail

Take CSCI 675!

Semidefinite Programming 6/13

slide-16
SLIDE 16

Semidefinite Programs

These are Optimization problems where the feasible set is the cone of PSD cone, possibly intersected with linear constraints. Generalization of LP . Special case of Convex Optimization. maximize c⊺x subject to Ax b x1F1 + x2F2 . . . xnFn + G is PSD F1, . . . , Fn, G, and A are given matrices, and c, b are given vectors.

Semidefinite Programming 7/13

slide-17
SLIDE 17

Semidefinite Programs

These are Optimization problems where the feasible set is the cone of PSD cone, possibly intersected with linear constraints. Generalization of LP . Special case of Convex Optimization. maximize c⊺x subject to Ax b x1F1 + x2F2 . . . xnFn + G is PSD F1, . . . , Fn, G, and A are given matrices, and c, b are given vectors.

Examples

Fitting a distribution, say a Gaussian, to observed data. Variable is a positive semi-definite covariance matrix. As a relaxation to combinatorial problems that encode pairwise relationships: e.g. finding the maximum cut of a graph.

Semidefinite Programming 7/13

slide-18
SLIDE 18

Semidefinite Programs

These are Optimization problems where the feasible set is the cone of PSD cone, possibly intersected with linear constraints. Generalization of LP . Special case of Convex Optimization. maximize c⊺x subject to Ax b x1F1 + x2F2 . . . xnFn + G is PSD F1, . . . , Fn, G, and A are given matrices, and c, b are given vectors.

Fact

SDP can be solved in polytime to arbitrary precision, since PSD constraints admit a polytime separation oracle.

Semidefinite Programming 7/13

slide-19
SLIDE 19

Outline

1

Basics of PSD Matrices

2

Semidefinite Programming

3

Max Cut

slide-20
SLIDE 20

The Max Cut Problem

Given an undirected graph G = (V, E), find a partition of V into (S, V \ S) maximizing number of edges with exactly one end in S. maximize

  • (i,j)∈E

1−xixj 2

subject to xi ∈ {−1, 1} , for i ∈ V.

Max Cut 8/13

slide-21
SLIDE 21

The Max Cut Problem

Given an undirected graph G = (V, E), find a partition of V into (S, V \ S) maximizing number of edges with exactly one end in S. maximize

  • (i,j)∈E

1−xixj 2

subject to xi ∈ {−1, 1} , for i ∈ V. Instead of requiring xi to be on the 1 dimensional sphere, we relax and permit it to be in the n-dimensional sphere, where n = |V |.

Vector Program relaxation

maximize

  • (i,j)∈E

1− vi· vj 2

subject to || vi||2 = 1, for i ∈ V.

  • vi ∈ Rn,

for i ∈ V.

Max Cut 8/13

slide-22
SLIDE 22

SDP Relaxation

Recall: A symmetric n × n matrix Y is PSD iff Y = V T V for n × n matrix V Equivalently: PSD matrices encode pairwise dot products of columns of V When diagonal entries of Y are 1, V has unit length columns Recall: Y and V can be recovered from each other efficiently

Max Cut 9/13

slide-23
SLIDE 23

SDP Relaxation

Recall: A symmetric n × n matrix Y is PSD iff Y = V T V for n × n matrix V Equivalently: PSD matrices encode pairwise dot products of columns of V When diagonal entries of Y are 1, V has unit length columns Recall: Y and V can be recovered from each other efficiently

Vector Program relaxation

maximize

  • (i,j)∈E

1− vi· vj 2

subject to || vi||2 = 1, for i ∈ V.

  • vi ∈ Rn,

for i ∈ V.

SDP Relaxation

maximize

  • (i,j)∈E

1−Yij 2

subject to Yii = 1, for i ∈ V. Y ∈ Sn

+

Max Cut 9/13

slide-24
SLIDE 24

Goemans Williamson Algorithm for Max Cut

1

Solve the SDP to get Y 0

2

Decompose Y to V V T

3

Draw random vector r on unit sphere

4

Place nodes i with vi · r ≥ 0 on one side of cut, the rest on the other side

SDP Relaxation

maximize

  • (i,j)∈E

1−Yij 2

subject to Yii = 1 ∀i Y ∈ Sn

+

Max Cut 10/13

slide-25
SLIDE 25

We will prove the following Lemma

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

Max Cut 11/13

slide-26
SLIDE 26

We will prove the following Lemma

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

Therefore, by linearity of expectations, and the fact that OPTSDP ≥ OPT (i.e. relaxation).

Theorem

The Goemans Williamson algorithm outputs a random cut of expected size at least 0.878 OPT.

Max Cut 11/13

slide-27
SLIDE 27

We use the following fact

Fact

For all angles θ ∈ [0, π], θ π ≥ 0.878 · 1 − cos(θ) 2

Max Cut 12/13

slide-28
SLIDE 28

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

Max Cut 13/13

slide-29
SLIDE 29

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

(i, j) is cut iff sign(r · vi) = sign(r · vj)

Max Cut 13/13

slide-30
SLIDE 30

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

(i, j) is cut iff sign(r · vi) = sign(r · vj) Can zoom in on the 2-d plane which includes vi and vj

Discard component r perpendicular to that plane, leaving r Direction of r is uniform in the plane

Max Cut 13/13

slide-31
SLIDE 31

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

(i, j) is cut iff sign(r · vi) = sign(r · vj) Can zoom in on the 2-d plane which includes vi and vj

Discard component r perpendicular to that plane, leaving r Direction of r is uniform in the plane

Let θij be angle between vi and vj. Note Yij = vi · vj = cos(θij)

Max Cut 13/13

slide-32
SLIDE 32

Lemma

The random hyperplane cuts each edge (i, j) with probability at least 0.8781−Yij

2

(i, j) is cut iff sign(r · vi) = sign(r · vj) Can zoom in on the 2-d plane which includes vi and vj

Discard component r perpendicular to that plane, leaving r Direction of r is uniform in the plane

Let θij be angle between vi and vj. Note Yij = vi · vj = cos(θij)

  • r cuts (i, j) w.p.

2θij 2π = θij π ≥ 0.8781 − cos θij 2 = 0.8781 − Yij 2

Max Cut 13/13