Convex Optimization 3. Convex Functions Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 42
Outline Basic properties and examples Operations that preserve convexity The conjugate function Quasiconvex functions Log-concave and log-convex functions Convexity with respect to generalized inequalities SJTU Ying Cui 2 / 42
Definition ◮ convex: f : R n → R is convex if dom f is a convex set and if f ( θ x + (1 − θ ) y ) ≤ θ f ( x ) + (1 − θ ) f ( y ) for all x , y ∈ dom f , and θ with 0 ≤ θ ≤ 1 ◮ geometric interpretation: line segment between ( x , f ( x )) and ( y , f ( y )) (i.e., chord from x to y ) lies above graph of f ( y, f ( y )) ( x, f ( x )) Figure 3.1 Graph of a convex function. The chord ( i.e. , line segment) be- tween any two points on the graph lies above the graph. ◮ concave: f is concave if − f is convex SJTU Ying Cui 3 / 42
Definition ◮ strictly convex: f : R n → R is strictly convex if dom f is a convex set and if f ( θ x + (1 − θ ) y ) < θ f ( x ) + (1 − θ ) f ( y ) for all x , y ∈ dom f , x � = y , and θ with 0 < θ < 1 ◮ strictly concave: f is strictly concave if − f is strictly convex ◮ affine functions are both convex and concave ◮ any function that is convex and concave is affine SJTU Ying Cui 4 / 42
Examples on R convex : ◮ affine: ax + b on R , for any a , b ∈ R ◮ exponential: e ax on R , for any a ∈ R ◮ powers: x α on R ++ , for α ≥ 1 or α ≤ 0 ◮ powers of absolute value: | x | p on R , for p ≥ 1 ◮ negative entropy: x log x on R ++ concave : ◮ affine: ax + b on R , for any a , b ∈ R ◮ powers: x α on R ++ , for 0 ≤ α ≤ 1 ◮ logarithm: log x on R ++ SJTU Ying Cui 5 / 42
Examples on R n and R m × n Examples on R n : ◮ affine function f (x)= a T x + b is both convex and concave ◮ every norm is convex ◮ due to triangle inequality and homogeneity i =1 | x i | p ) 1 / p for p ≥ 1 ( || x || 1 = � n ◮ l p -norms: || x || p =( � n i =1 | x i | , || x || ∞ =max k | x k | ) ◮ max function f ( x ) = max { x 1 , · · · , x n } is convex ◮ log-sum-exp f ( x ) = log( e x 1 + · · · + e x n ) is convex ◮ a differentiable approximation of the max function: log( e x 1 + · · · + e x n ) − log n ≤ max { x 1 , · · · , x n } ≤ log( e x 1 + · · · + e x n ) Examples on R m × n : ◮ affine function f ( X ) = tr ( A T X ) + b = � m � n j =1 A ij X ij + b i =1 is both convex and concave ◮ spectral (maximum singular value) norm f ( X ) = � X � 2 = σ max ( X ) = ( λ max ( X T X )) 1 / 2 on is convex SJTU Ying Cui 6 / 42
Restriction of a convex function to a line ◮ a function f : R n → R is convex iff it is convex when restricted to any line that intersects its domain, i.e., ◮ g ( t ) = f ( x + tv ) is convex on { t | x + tv ∈ dom f } for all x ∈ dom f and all v ∈ R n ◮ check convexity of a function of multiple variables by restricting it to a line and checking convexity of a function of one variable ◮ example: f : S n → R with f ( X ) = log det X , dom f = S n ++ Consider an arbitrary line X = Z + tV ∈ S n ++ with Z , V ∈ S n . w. l. o. g., assume t = 0 is in the interval, i.e., Z ∈ S n ++ . g ( t ) = log det( Z + tV ) = log det( Z 1 / 2 ( I + tZ − 1 / 2 VZ − 1 / 2 ) Z 1 / 2 ) = log det Z + log det( I + tZ − 1 / 2 VZ − 1 / 2 ) n � λ i : eigenvalues of Z − 1 / 2 VZ − 1 / 2 = log det Z + log (1 + t λ i ) i =1 g is concave in t . Thus, f is concave. SJTU Ying Cui 7 / 42
Extended-value extension ◮ extended-value extension ˜ f of a convex function f is � f ( x ) , x ∈ dom f ˜ f ( x ) = ∞ , ∈ dom f x / ◮ ˜ f is defined on all R n , and takes values in R ∪ {∞} ◮ recover domain of f from ˜ f as dom f = { x | ˜ f ( x ) < ∞} ◮ extension can simplify notation, as no need to explicitly describe the domain, or add the qualifier ‘for all x ∈ dom f ’ ◮ basic defining inequality for convexity can be expressed as: for 0 < θ < 1, ˜ f ( θ x + (1 − θ ) y ) ≤ θ ˜ f ( x ) + (1 − θ )˜ f ( y ) for any x and y ◮ the inequality always holds for θ = 0 , 1 ◮ no need to mention the two conditions: dom f is convex (can be shown by contradiction) and x , y ∈ dom f ( x , y ∈ R n is used instead, which can be omitted) SJTU Ying Cui 8 / 42
First-order conditions Suppose f is differentiable , i.e., dom f is open and gradient � � ∂ f ( x ) ∂ x 1 , · · · , ∂ f ( x ) ∇ f ( x ) = exists at any x ∈ dom f ∂ x n ◮ f is convex iff dom f is convex and f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) for all x , y ∈ dom f ◮ first-order Taylor approx. of a convex function is a global underestimator of it; if first-order Taylor approx. of a function is always a global underestimator of it, then it is convex ◮ local information about a convex function (value and derivative at a point) implies global information (a global underestimator) ◮ if f is convex and ∇ f ( x ) = 0, then x is a global minimizer of f ◮ f is strictly convex iff dom f is convex and f ( y ) > f ( x ) + ∇ f ( x ) T ( y − x ) for all x , y ∈ dom f and x � = y f ( y ) f ( x ) + ∇ f ( x ) T ( y − x ) ( x, f ( x )) Figure 3.2 If f is convex and differentiable, then f ( x )+ ∇ f ( x ) T ( y − x ) ≤ f ( y ) for all x, y ∈ dom f . SJTU Ying Cui 9 / 42
Second-order conditions Suppose f is twice differentiable , i.e., dom f is open and Hessian ∇ 2 f ( x ) ∈ S n exists at any x ∈ dom f , where ∇ 2 f ( x ) ij = ∂ 2 f ( x ) ∂ x i ∂ x j , i , j = 1 , · · · , n ◮ f is convex iff dom f is convex and ∇ 2 f ( x ) � 0 for all x ∈ dom f ◮ for a function on R , this reduces to dom f is an interval and f ′′ ( x ) ≥ 0 for all x in the interval ◮ ∇ 2 f ( x ) � 0 means the graph of f has positive (upward) curvature at x ◮ if dom f is convex and ∇ 2 f ( x ) ≻ 0 for all x ∈ dom f , then f is strictly convex ◮ the converse is not true, e.g., f ( x ) = x 4 is strictly convex but f ′′ (0) = 0 SJTU Ying Cui 10 / 42
Second-order conditions Examples ◮ quadratic function: f ( x ) = (1 / 2) x T Px + q T x + r ( P ∈ S n ) ∇ 2 f ( x ) = P ∇ f ( x ) = Px + q , convex iff P ∈ S n + ◮ least-squares objective: 2 = x T A T Ax − 2 x T A T b + b T b f ( x ) = � Ax − b � 2 ∇ 2 f ( x ) = 2 A T A ∇ f ( x ) = 2 A T ( Ax − b ) , convex for all A ∈ R m × n (as A T A � 0 for all A ∈ R m × n ) ◮ quadratic-over-linear function: f ( x , y ) = x 2 / y � y � � y � T ∇ 2 f ( x , y ) = 2 � 0 y 3 − x − x convex for all x ∈ R and y ∈ R ++ (as zz T � 0 for all z ∈ R n ) SJTU Ying Cui 11 / 42
Second-order conditions Examples ◮ log-sum-exp: f ( x ) =log � n k =1 exp x k is convex ◮ proof: 1 1 ∇ 2 f ( x ) = ( 1 T z ) 2 zz T 1 T z diag ( z ) − ( z k = exp x k ) to show ∇ 2 f ( x ) � 0, we must verify that v T ∇ 2 f ( x ) v ≥ 0 for all v : k z k v 2 k v k z k ) 2 v T ∇ 2 f ( x ) v = ( � k )( � k z k ) − ( � ≥ 0 ( � k z k ) 2 k v k z k ) 2 ≤ ( � k z k v 2 since ( � k )( � k z k ) (from Cauchy-Schwarz inequality ( a T a )( b T b ) ≥ ( a T b ) 2 by treating a i = v i √ z i and b i = √ z i ) k =1 x k ) 1 / n on R n ◮ geometric mean: f ( x ) = ( � n ++ is concave (similar proof as for log-sum-exp) SJTU Ying Cui 12 / 42
Sublevel set and superlevel set Sublevel set ◮ α -sublevel set of f : R n → R : { x ∈ dom f | f ( x ) ≤ α } ◮ sublevel sets of a convex function are convex ◮ the converse is false (e.g., f ( x ) = − exp x is not convex (indeed, strictly concave) but all its sublevel sets are convex) Superlevel set ◮ α -superlevel set of f : R n → R : { x ∈ dom f | f ( x ) ≥ α } ◮ superlevel sets of a concave function are convex To establish convexity of a set, express it as a sublevel set of a convex function, or as the superlevel set of a concave function. SJTU Ying Cui 13 / 42
Epigraph and hypergraph ◮ graph of f : R n → R : { ( x , f ( x )) | x ∈ dom f } ⊆ R n +1 ◮ epigraph of f : R n → R : epi f = { ( x , t ) ∈ R n +1 | x ∈ dom f , f ( x ) ≤ t } ⊆ R n +1 ◮ f is convex iff epi f is a convex set ◮ hypograph of f : R n → R : hypo f = { ( x , t ) ∈ R n +1 | x ∈ dom f , f ( x ) ≥ t } ⊆ R n +1 ◮ f is concave iff hypo f is a convex set epi f f Figure 3.5 Epigraph of a function f , shown shaded. The lower boundary, shown darker, is the graph of f . SJTU Ying Cui 14 / 42
Jensen’s inequality and extensions ◮ basic inequality: if f is convex, x , y ∈ dom f and 0 ≤ θ ≤ 1, then f ( θ x + (1 − θ ) y ) ≤ θ f ( x ) + (1 − θ ) f ( y ) ◮ extension to convex combinations of more than two points: if f is convex, x 1 , · · · , x k ∈ dom f , and θ 1 , · · · , θ k ≥ 0 with θ 1 + · · · + θ k = 1, then f ( θ 1 x 1 + · · · + θ k x k ) ≤ θ 1 f ( x 1 ) + · · · + θ k f ( x k ) ◮ extensions to infinite sums and integrals (if p ( x ) ≥ 0 on � S ⊆ dom f , S p ( x ) dx = 1, then �� � � f S p ( x ) xdx ≤ S f ( x ) p ( x ) dx , provided the integrals exist) ◮ extension to expected values: if f is convex and X is a random variable such that X ∈ dom f w.p. 1, then f ( E X ) ≤ E f ( X ), provided the expectations exist ◮ many famous inequalities (e.g., arithmetic-geometric mean inequality and H¨ older’s inequality) can be derived by applying Jensen’s inequality to some convex function SJTU Ying Cui 15 / 42
Recommend
More recommend