More Subgradient Calculus: Function Convexity first Following - - PowerPoint PPT Presentation

more subgradient calculus function convexity
SMART_READER_LITE
LIVE PREVIEW

More Subgradient Calculus: Function Convexity first Following - - PowerPoint PPT Presentation

More Subgradient Calculus: Function Convexity first Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? n i f i is convex if


slide-1
SLIDE 1

More Subgradient Calculus: Function Convexity first

Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability?

n

Nonnegative weighted sum:f= ∑ αifi is convex if eachf i for1≤ i≤ nis convex and

i=1

αi ≥0,1≤i≤n. Composition with affine function:f(Ax+b)is convex iffis convex. For example:

m

The log barrier for linear inequalities,f(x) =− ∑ log(bi −a Tx), is convex since−log(x)is

i i=1

convex. Any norm of an affine function,f(x) =||Ax+b||, is convex.

▶ September 1, 2018 86 / 402

slide-2
SLIDE 2

More of Basic Subgradient Calculus

Scaling:∂(af) =a·∂fprovideda>0. The conditiona>0makes functionfremain convex. Addition:∂(f 1 +f 2) =∂(f 1) +∂(f 2) Affine composition: ifg(x) =f(Ax+b), then∂g(x) =A

T∂f(Ax+ b)

Norms: important special case,f(x) =||x||

p

The derivations done in class could be used to show that if any other subgradient exists for g outside the stated set above, that could be used to construct a subgradient for f

  • utside the stated set above

as well!

September 1, 2018 87 / 402

slide-3
SLIDE 3

More of Basic Subgradient Calculus

Scaling:∂(af) =a·∂fprovideda>0. The conditiona>0makes functionfremain convex. Addition:∂(f 1 +f 2) =∂(f 1) +∂(f 2) Affine composition: ifg(x) =f(Ax+b), then∂g(x) =A

T∂f(Ax+ b)

Norms: important special case,f(x) = ||x || = max zTxwhereqis such that

p ||z||q≤1

On the board we have used y instead of z

1/p+ 1/q= 1. Then

September 1, 2018 87 / 402

slide-4
SLIDE 4

More of Basic Subgradient Calculus

Scaling:∂(af) =a·∂fprovideda>0. The conditiona>0makes functionfremain convex. Addition:∂(f 1 +f 2) =∂(f 1) +∂(f 2) Affine composition: ifg(x) =f(Ax+b), then∂g(x) =A

T∂f(Ax+ b)

Norms: important special case,f(x) = ||x || = max zTxwhereqis such that

p ||z||q≤1

1/p+ 1/q= 1. Then { ∂f(x) = y:||y||

q ≤ 1andy

}

Tx=max

zTx =

||z||q≤1

y corresponds to z where the max is attained The part above is largely connected to previous discussion on max

  • f convex functions

September 1, 2018 87 / 402

slide-5
SLIDE 5

More of Basic Subgradient Calculus

Scaling:∂(af) =a·∂fprovideda>0. The conditiona>0makes functionfremain convex. Addition:∂(f 1 +f 2) =∂(f 1) +∂(f 2) Affine composition: ifg(x) =f(Ax+b), then∂g(x) =A

T∂f(Ax+ b)

Norms: important special case,f(x) = ||x || = max zTxwhereqis such that

p

This is derived in

||z||q≤1

1/p+ 1/q= 1. Then { ∂f(x) = y:||y||

q ≤ 1andy

class

} { }

Tx=max

zTx = y:||y||

q ≤1andy Tx= ||x|| p ||z||q≤1

Why ||y||_q <= 1 is because of Minkowski's inequality

September 1, 2018 87 / 402

slide-6
SLIDE 6

Subgradients for the ‘Lasso’Problem in Machine Learning

We use Lasso (min f(x)) as an example to illustrate subgradients of affine composition:

x

f(x) = 1||y−x|| 2 +λ||x||

1

2 The subgradients off(x)are

x - y + \lambda s Where s = {+1,-1}^n such that ||x||_1 = s^T x

September 1, 2018 88 / 402

slide-7
SLIDE 7

Subgradients for the ‘Lasso’Problem in Machine Learning

We use Lasso (min f(x)) as an example to illustrate subgradients of affine composition:

x

f(x) = 1||y−x|| 2 +λ||x||

1

2 The subgradients off(x)are h=x−y+λs, wheres i =sign(x i)ifx

i = 0ands i ∈[−1,1]ifx i = 0.

Second component is a result of the convex hull

September 1, 2018 88 / 402

slide-8
SLIDE 8

More Subgradient Calculus: Composition

Following functions, though convex, may not be differentiable everywhere. How does one compute their subgradients? (what holds for subgradient also holds for gradient) Composition with functions:Letp:ℜ

k → ℜwithq(x) = ∞ ,∀x/ ∈dom hand

q:ℜ n → ℜk. Definef(x) = p(q(x)).fis convex if

We will consider

  • nly the first case

qi is convex,pis convex and nondecreasing in each argument

  • rq i is concave,pis convex and nonincreasing in each argument

▶ ▶ September 1, 2018 89 / 402

slide-9
SLIDE 9

i is concave, is convex and nonincreasing in each argument

More Subgradient Calculus: Composition

Following functions, though convex, may not be differentiable everywhere. How does one compute their subgradients? (what holds for subgradient also holds for gradient) Composition with functions:Letp:ℜ

k → ℜwithq(x) = ∞ ,∀x/ ∈dom hand

q:ℜ n → ℜk. Definef(x) = p(q(x)).fis convex if

In both conditions,

qi is convex,pis convex and nondecreasing in each argument composition will be

▶ ▶ orq

p

concave if p is concave

Some examples illustrating this property are:

exp q(x)is convex ifqis convex

exp is a monotonic and convex p

m

∑ logq i(x)is concave ifq i are concave and positive

i=1

p is concave and hence the composition is concave

m

log ∑ expq i(x)is convex ifq i are convex

i=1

1/q(x)is convex ifqis concave and positive

▶ September 1, 2018 89 / 402

slide-10
SLIDE 10

More Subgradient Calculus: Composition (contd)

Composition with functions:Letp:ℜ

k → ℜwithq(x) = ∞ ,∀x/ ∈dom

hand q:ℜ n → ℜk. Definef(x) =p(q(x)).fis convex if

qi is convex,pis convex and nondecreasing in each argument

  • rq i is concave,pis convex and nonincreasing in each argument

▶ ▶

Subgradients for the first case (second one is homework):

September 1, 2018 90 / 402

slide-11
SLIDE 11

p ( q1(x) + hT (y−x), . . . ,q k(x) + hT (y−x)

q1 qk

More Subgradient Calculus: Composition (contd)

Composition with functions:Letp:ℜ

k → ℜwithq(x) = ∞ ,∀x/ ∈dom

hand q:ℜ n → ℜk. Definef(x) =p(q(x)).fis convex if

qi is convex,pis convex and nondecreasing in each argument

  • rq i is concave,pis convex and nonincreasing in each argument

▶ ▶

Subgradients for the first case (second one is homework):

f(y) =p (q1(y), . . . ,q k(y)) ≥p ( q1(x) + hT (y−x), . . . ,q k(x) + hT (y−x) )

q1 qk

Where hqi ∈∂q i(x)fori= 1..kand sincep(.)is non-decreasing in each argument. )

≥ p (q1(x), . . . ,q k(x)) + hT ( hT (y−x), . . . , hT (y−x) )

p q1 qk

Where hp ∈∂p (q1(x), . . . ,q k(x))

All we need to do next is club together h_p and h_q and leave only (y-x) in the second component

September 1, 2018 90 / 402

slide-12
SLIDE 12

More Subgradient Calculus: Composition (contd)

Composition with functions:Letp:ℜ

k → ℜwithq(x) = ∞ ,∀x/ ∈dom

hand q:ℜ n → ℜk. Definef(x) =p(q(x)).fis convex if

qi is convex,pis convex and nondecreasing in each argument

  • rq i is concave,pis convex and nonincreasing in each argument

▶ ▶

Subgradients for the first case (second one is homework):

f(y) =p (q1(y), . . . ,q k(y)) ≥p ( q1(x) + hT (y−x), . . . ,q k(x) + hT (y−x) )

q1 qk

Where hqi ∈∂q i(x)fori= 1..kand sincep(.)is non-decreasing in each argument. p ( q1(x) + hT (y−x), . . . ,q k(x) + hT (y−x) ) ≥

q1 qk

p (q1(x), . . . ,q k(x)) + hT ( hT (y−x), . . . , hT (y−x) )

p q1 qk

Where hp ∈∂p (q1(x), . . . ,q k(x))

k

p (q1(x), . . . ,q k(x)) +h T ( hT (y−x), . . . , hT (y−x) ) =f(x) + ∑ (hp)ihT (y−x)

p q1 qk qi i=1

k

That is, ∑ (hp)ihq is a subgradient of the composite function atx.

i

H/W: Derive the subdifferentials to example functions on previous slide

i=1

September 1, 2018 90 / 402

slide-13
SLIDE 13

More Subgradient Calculus: Proximal Operator

Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum:Ifc(x,y)is convex in(x,y)andCis a convex set, thend(x) =inf

  • convex. For example:

c(x,y)is

y∈C

Letd(x,C)that returns the distance of a pointxto a convex setC. That is d(x,C) = inf ||x−y||= ||x−P

C(x)||, where, PC(x) = argmind(x,C)

.

Thend(x,C)is a

y∈C

convex function and∇d(x,C) = x− P C(x) ∥x−P C(x)∥

H/w: Prove that d is convex if c is a convex function and if C is a convex set

September 1, 2018 91 / 402

slide-14
SLIDE 14

More Subgradient Calculus: Proximal Operator

Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum:Ifc(x,y)is convex in(x,y)andCis a convex set, thend(x) =inf

  • convex. For example:

c(x,y)is

y∈C

Letd(x,C)that returns the distance of a pointxto a convex setC. That is d(x,C) = inf ||x−y||= ||x−P

C(x)||, where, PC(x) = argmind(x,C)

.

Thend(x,C)is a

y∈C

convex function and∇d(x,C) = x− P C(x) ....The point of intersection of convex sets ∥x−P C(x)∥ C1,C 2,...Cm by minimizing... (Subgradients and Alternating Projections) argmin d(x,C)is a special case of the proximity operator:prox c(x) = argmin PROXc(x)of a

y y∈C

convex functionc(x). Here,PROX c(x) =c(y) +

1 ||x−y||The special case is when 2

c(x) is the indicator function over C

September 1, 2018 91 / 402

slide-15
SLIDE 15

⋆ We will invoke this when we discuss the proximal gradient descent algorithm

More Subgradient Calculus: Proximal Operator

Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum:Ifc(x,y)is convex in(x,y)andCis a convex set, thend(x) =inf

  • convex. For example:

c(x,y)is

y∈C

Letd(x,C)that returns the distance of a pointxto a convex setC. That is d(x,C) = inf ||x−y||= ||x−P

C(x)||, where, PC(x) = argmind(x,C)

.

Thend(x,C)is a

y∈C

convex function and∇d(x,C) = x− P C(x) ....The point of intersection of convex sets ∥x−P C(x)∥ C1,C 2,...Cm by minimizing... (Subgradients and Alternating Projections) argmin d(x,C)is a special case of the proximity operator:prox c(x) = argmin PROXc(x)of a

y y∈C

convex functionc(x). Here,PROX c(x) =c(y) +

1 ||x−y||The special case is whenc(y)is 2

Proximal the indicator functionI C(y)introduced earlier to eliminate the contraints of an optimization will be done in details later

problem.

Recall that∂I C(y) =N C(y) ={h∈ ℜ

n :h Ty≥h Tzfor anyz∈C}

The subdifferential∂PROX c(x) =∂c(y) +y−xwhich can now be obtained for the special

⋆ ⋆

casec(y) = I

C(y). September 1, 2018 91 / 402