lecture 3 convex function
play

Lecture 3: Convex Function CK Cheng Dept. of Computer Science and - PowerPoint PPT Presentation

CSE203B Convex Optimization: Lecture 3: Convex Function CK Cheng Dept. of Computer Science and Engineering University of California, San Diego 1 Outlines 1. Definitions: Convexity, Examples & Views 2. Conditions of Optimality 1. First


  1. CSE203B Convex Optimization: Lecture 3: Convex Function CK Cheng Dept. of Computer Science and Engineering University of California, San Diego 1

  2. Outlines 1. Definitions: Convexity, Examples & Views 2. Conditions of Optimality 1. First Order Condition 2. Second Order Condition 3. Operations that Preserve the Convexity 1. Pointwise Maximum 2. Partial Minimization 4. Conjugate Function 5. Log-Concave, Log-Convex Functions 2

  3. Outlines 1. Definitions 1. Convex Function vs Convex Set 2. Examples 1. Norm 2. Entropy 3. Affine 4. Determinant 5. Maximum 3. Views of Functions and Related Hyperplanes 3

  4. 1. Definitions: Convex Function vs Convex Set Theorem: Given 𝑇 = 𝑦 𝑔 𝑦 ≀ 𝑐 If function 𝑔 𝑦 is convex, then 𝑇 is a convex set. Proof: We prove by the definition of convex set. For every 𝑣, 𝑀 ∈ 𝑇, i. e. 𝑔 𝑣 ≀ 𝑐, 𝑔 𝑀 ≀ 𝑐, We want to show that α𝑣 + 𝛾𝑀 ∈ 𝑇, βˆ€Ξ± + 𝛾 = 1, 𝛽, 𝛾 β‰₯ 0. We have 𝑔 𝛽𝑣 + 𝛾𝑀 ≀ 𝛽𝑔 𝑣 + 𝛾𝑔 𝑀 (𝑔 𝑗𝑑 π‘‘π‘π‘œπ‘€π‘“π‘¦) ≀ 𝛽𝑐 + 𝛾𝑐 (𝛽, 𝛾 β‰₯ 0) = 𝛽 + 𝛾 βˆ™ 𝑐 = 𝑐 (𝛽 + 𝛾 = 1) Thus α𝑣 + 𝛾𝑀 ∈ 𝑇 Remark: Convex function => Convex Set 𝑔(𝑦) ≀ 𝑐 => Convex Set 𝑔(𝑦) β‰₯ 𝑐 => ? 4

  5. 1. Convex Function Definitions: Examples 𝑔: 𝑆 π‘œ β†’ 𝑆 is convex if 𝑒𝑝𝑛 𝑔 is a convex set and 𝑔 πœ„π‘¦ + 1 βˆ’ πœ„ 𝑧 ≀ πœ„π‘” 𝑦 + 1 βˆ’ πœ„ 𝑔(𝑧) βˆ€π‘¦, 𝑧 ∈ 𝑒𝑝𝑛 𝑔, 0 ≀ πœ„ ≀ 1 Example on R: Convex Functions 𝑏𝑦 + 𝑐 π‘π‘œ 𝑆 for any 𝑏, 𝑐 ∈ 𝑆 Affine: Exponential: 𝑓 𝑏𝑦 for any 𝑏 ∈ 𝑆 Power: 𝑦 𝛽 π‘π‘œ 𝑆 ++ for 𝛽 β‰₯ 1 or 𝛽 ≀ 0 𝑦 π‘ž π‘π‘œ 𝑆 for π‘ž β‰₯ 1 Concave Functions 𝑏𝑦 + 𝑐 π‘π‘œ 𝑆 for any 𝑏, 𝑐 ∈ 𝑆 Affine: Power: 𝑦 𝛽 π‘π‘œ 𝑆 ++ for 0 ≀ 𝛽 ≀ 1 Logarithm: π‘šπ‘π‘•π‘¦ π‘π‘œ 𝑆 ++ 5

  6. 1. Convex Function Definitions: Examples Example on 𝑆 π‘œ : 𝑔 𝑦 = 𝑏 π‘ˆ 𝑦 + 𝑐 Affine: 1 π‘ž 𝑔𝑝𝑠 π‘ž β‰₯ 1; π‘œ 𝑦 π‘ž ΰ΅— Norms: 𝑦 π‘ž = Οƒ 𝑗=1 𝑦 ∞ = max |𝑦 𝑙 | 𝑙 Example on 𝑆 π‘›Γ—π‘œ : 𝑛 Οƒ π‘˜=1 π‘œ 𝑔 π‘Œ = 𝑒𝑠 𝐡 π‘ˆ π‘Œ = Οƒ 𝑗=1 𝐡 π‘—π‘˜ 𝑦 π‘—π‘˜ Affine: Spectral (max singular value): 1 2 π‘Œ 2 = 𝜏 𝑛𝑏𝑦 π‘Œ = (πœ‡ 𝑛𝑏𝑦 π‘Œ π‘ˆ π‘Œ ) Ξ€ 𝑔 π‘Œ = 6

  7. 1. Convex Function Definitions: Examples Concave Functions: π‘œ Log Determinant: 𝑔 π‘Œ = log det π‘Œ , 𝑒𝑝𝑛 𝑔 = S ++ π‘Š ∈ 𝑇 π‘œ Proof: Let 𝑕 𝑒 = 𝑔 π‘Œ + π‘’π‘Š 𝑕 𝑒 = π‘šπ‘π‘• 𝑒𝑓𝑒 (π‘Œ + π‘’π‘Š) = π‘šπ‘π‘• 𝑒𝑓𝑒 π‘Œ + π‘šπ‘π‘•π‘’π‘“π‘’(𝐽 + π‘’π‘Œ βˆ’ 1 2 π‘Šπ‘Œ βˆ’ 1 2 ) π‘œ = π‘šπ‘π‘• 𝑒𝑓𝑒 π‘Œ + Οƒ 𝑗=1 π‘šπ‘π‘•(1 + π‘’πœ‡ 𝑗 ) πœ‡ 𝑗 : π‘“π‘—π‘•π‘“π‘œπ‘€π‘π‘šπ‘£π‘“ 𝑝𝑔 π‘Œ βˆ’ 1 2 π‘Šπ‘Œ βˆ’ 1 2 𝑕 is concave in 𝑒 β‡’ 𝑔 is concave 7

  8. Convex function examples: norm, max, expectation norm: If 𝑔: 𝑆 π‘œ β†’ 𝑆 is a norm and 0 ≀ πœ„ ≀ 1 𝑔 πœ„π‘¦ + 1 βˆ’ πœ„ 𝑧 ≀ 𝑔 πœ„π‘¦ + 𝑔 1 βˆ’ πœ„ 𝑧 triangle inequality = πœ„π‘”(𝑦) + (1 βˆ’ πœ„)𝑔(𝑧) scalability 𝑦 𝑗 , 𝑦 = 𝑦 1 , 𝑦 2 , … , 𝑦 π‘œ π‘ˆ Max function: 𝑔 𝑦 = max 𝑗 𝑔 πœ„π‘¦ + 1 βˆ’ πœ„ 𝑧 = max πœ„π‘¦ 𝑗 + 1 βˆ’ πœ„ 𝑧 𝑗 𝑗 ≀ πœ„ max 𝑦 𝑗 + 1 βˆ’ πœ„ max 𝑧 𝑗 𝑗 𝑗 = πœ„π‘” 𝑦 + 1 βˆ’ πœ„ 𝑔 𝑧 for 0 ≀ πœ„ ≀ 1 Probability: (Expectation) If 𝑔 𝑦 is convex with π‘ž 𝑦 a probability at 𝑦, i. e. π‘ž 𝑦 β‰₯ 0, βˆ€π‘¦ and Χ¬ π‘ž(𝑦) 𝑒𝑦 = 1 Then 𝑔 𝐹𝑦 ≀ 𝐹𝑔 𝑦 , where 𝐹𝑦 = ׬𝑦 π‘ž 𝑦 𝑒𝑦 𝐹𝑔(𝑦) = Χ¬ 𝑔(𝑦) π‘ž 𝑦 𝑒𝑦 8

  9. 1.3 Views of Functions and Related Hyperplanes Given 𝑔 𝑦 , 𝑦 ∈ 𝑆 π‘œ , we plot the function in 𝑆 π‘œ and 𝑆 π‘œ+1 spaces. 1. Draw function in 𝑆 π‘œ space 𝑦 π‘ˆ 𝑦 βˆ’ ΰ·€ Equipotential surface: tangent plane 𝛼𝑔 ΰ·€ 𝑦 = 0 at ΰ·€ 𝑦 2. Draw function in 𝑆 π‘œ+1 space 2.1 Graph of function: {(𝑦, β„Ž)|𝑦 ∈ 𝑒𝑝𝑛 𝑔, β„Ž = 𝑔 𝑦 } 𝑦 π‘ˆ 𝑦 βˆ’ ΰ·€ 𝐒𝐳πͺ𝐟𝐬πͺπ¦π›π¨πŸ (h = 𝛼𝑔 ΰ·€ 𝑦 + 𝑔(ΰ·€ 𝑦) ) 𝑦 𝑦 ΰ·€ 𝑦 π‘ˆ βˆ’ 1 𝛼𝑔 ΰ·€ β„Ž βˆ’ = 0 𝑔 ΰ·€ 𝑦 Example: 𝑔 𝑦 = 𝑦 2 . We show the hyperplane with 𝛼𝑔 𝑦 2.2. Epigraph: epi 𝑔 : {(x, 𝑒)|𝑦 ∈ 𝑒𝑝𝑛 𝑔, 𝑔 𝑦 ≀ 𝑒} A function is convex iff its epigraph is a convex set. Example: 𝑔 𝑦 = max 𝑔 𝑗 𝑦 | 𝑗 = 1 … 𝑠 , 𝑔 𝑗 𝑦 𝑏𝑠𝑓 π‘‘π‘π‘œπ‘€π‘“π‘¦. Since epi 𝑔 is the intersect of epi 𝑔 𝑗 , epi 𝑔 is convex. Thus, function 𝑔 is convex. 9

  10. 2. Conditions of Optimality: First Order Condition D efintion: 𝑔 is differentiable if 𝑒𝑝𝑛𝑔 is open and πœ–π‘” 𝑦 πœ–π‘” 𝑦 πœ–π‘” 𝑦 𝛼𝑔(𝑦) ≑ ( πœ–π‘¦ 1 , πœ–π‘¦ 2 , … , πœ–π‘¦ π‘œ ) exists at each 𝑦 ∈ 𝑒𝑝𝑛𝑔 Theorem: Differentiable 𝑔 with convex domain is convex iff 𝑔 𝑧 β‰₯ 𝑔 𝑦 + 𝛼𝑔 𝑦 T 𝑧 βˆ’ 𝑦 , βˆ€π‘¦, 𝑧 ∈ 𝑒𝑝𝑛𝑔 Proof => If 𝑔 is convex π‘ˆβ„Žπ‘“π‘œ 1 βˆ’ 𝑒 𝑔 𝑦 + 𝑒𝑔 𝑧 β‰₯ 𝑔 1 βˆ’ 𝑒 𝑦 + 𝑒𝑧 , βˆ€0 ≀ 𝑒 ≀ 1 𝑒 𝑔 𝑧 βˆ’ 𝑔 𝑦 β‰₯ 𝑔 𝑦 + 𝑒 𝑧 βˆ’ 𝑦 βˆ’ 𝑔(𝑦) 1 𝑔 𝑧 βˆ’ 𝑔 𝑦 β‰₯ 𝑒 (𝑔 𝑦 + 𝑒 𝑧 βˆ’ 𝑦 βˆ’ 𝑔 𝑦 ) = 𝛼𝑔 𝑦 𝑧 βˆ’ 𝑦 π‘₯β„Žπ‘“π‘œ 𝑒 β†’ 0 <= π»π‘—π‘€π‘“π‘œ 𝑔 𝑧 β‰₯ 𝑔 𝑦 + 𝛼𝑔 𝑦 T 𝑧 βˆ’ 𝑦 , βˆ€π‘¦, 𝑧 ∈ 𝑒𝑝𝑛𝑔 𝑀𝑓𝑒 𝑨 = 1 βˆ’ 𝑒 𝑦 + 𝑒𝑧 where ࡝ 𝑔 𝑦 β‰₯ 𝑔 𝑨 + 𝛼𝑔 𝑨 T 𝑦 βˆ’ 𝑨 𝑔 𝑧 β‰₯ 𝑔 𝑨 + 𝛼𝑔 𝑨 T 𝑧 βˆ’ 𝑨 Thus 1 βˆ’ 𝑒 𝑔 𝑦 + 𝑒𝑔 𝑧 β‰₯ 𝑔(𝑨) 10

  11. 2. Conditions: Second Order Condition Definition: 𝑔 is twice differentiable if 𝑒𝑝𝑛𝑔 is open and the Hessian 𝛼 2 𝑔 𝑦 ∈ 𝑇 π‘œ πœ– 2 𝑔 𝑦 𝛼 2 𝑔 𝑦 π‘—π‘˜ ≑ πœ–π‘¦ 𝑗 πœ–π‘¦ π‘˜ , 𝑗, π‘˜ = 1, … , π‘œ exists at each 𝑦 ∈ 𝑒𝑝𝑛𝑔 Theorem: Twice Differentiable 𝑔 with convex domain is convex iff 𝛼 2 𝑔 𝑦 ≽ 0, βˆ€π‘¦ ∈ 𝑒𝑝𝑛𝑔 Proof: Using Lagrange remainder, we can find a z 𝑔 𝑦 + 𝑒(𝑧 βˆ’ 𝑦) = 𝑔 𝑦 + 𝛼𝑔 𝑦 π‘ˆ 𝑒 𝑧 βˆ’ 𝑦 + 1 2 𝑒 2 𝑧 βˆ’ 𝑦 π‘ˆ 𝛼 2 𝑔 𝑨 𝑧 βˆ’ 𝑦 , βˆ€0 ≀ 𝑒 ≀ 1, 𝑨 is between 𝑦 and 𝑦 + 𝑒(𝑧 βˆ’ 𝑦) Since the last term is always positive by assumption, the first order condition is satisfied. 11

  12. 2. Conditions: Second Order Condition Example: Negative Entropy: 𝑔 𝑦 = 𝑦 log 𝑦 , 𝑦 ∈ 𝑆 ++ 𝑔 β€² 𝑦 = 𝑦 𝑦 + log 𝑦 = 1 + log 𝑦 , 𝑔 β€²β€² 𝑦 = 1 𝑦 Since 𝑦 ∈ 𝑆 ++ , 𝑔 β€²β€² 𝑦 > 0 β‡’ 𝑔 𝑦 is convex Show the plot of 𝑦 log 𝑦 Remark: β€’ 1 st order condition can be used to design and prove the property of opt. algorithms. β€’ 2 nd order condition implies the 1 st order condition β€’ 2 nd order condition can be used to prove the convexity of the functions. 12

  13. 2. Conditions: Examples 1 2 𝑦 π‘ˆ 𝑄𝑦 + π‘Ÿ π‘ˆ 𝑦 + 𝑠, 𝑄 ∈ 𝑇 π‘œ β€’ Quadratic Function: 𝑔 𝑦 = 𝛼𝑔 𝑦 = 𝑄𝑦 + π‘Ÿ, 𝛼 2 𝑔 𝑦 = 𝑄 2 β€’ Least Square: 𝑔 𝑦 = 𝐡𝑦 βˆ’ 𝑐 2 𝛼𝑔 𝑦 = 2𝐡 π‘ˆ 𝐡𝑦 βˆ’ 𝑐 , 𝛼 2 𝑔 𝑦 = 𝐡 π‘ˆ 𝐡 𝑦 2 β€’ Quadratic over linear: 𝑔 𝑦, 𝑧 = 𝑧 , 𝑧 > 0 π‘ˆ 𝑧 , βˆ’ 𝑦 2 2𝑦 𝛼𝑔 𝑦, 𝑧 = , 𝑧 2 , 2 βˆ’ 2𝑦 𝑧 2 = 2 𝑧 𝑧 𝛼 2 𝑔 𝑦 = 𝑧 βˆ’π‘¦ 2𝑦 2 βˆ’π‘¦ 𝑧 3 βˆ’ 2𝑦 𝑧 2 𝑧 3 13

  14. 2. Conditions: Examples π‘œ 𝑓 𝑦 𝑙 (Smooth max of softmax β€’ Log-sum-exp: 𝑔 𝑦 = log Οƒ 𝐿=1 function) 1 1 𝛼 2 𝑔 𝑦 = 1 π‘ˆ 𝑨 𝑨𝑨 π‘ˆ , 𝑨 𝑙 = 𝑓 𝑦 𝑙 1 π‘ˆ 𝑨 𝑒𝑗𝑏𝑕 𝑨 βˆ’ 1 π‘œ π‘œ 2 𝑨 𝑗 βˆ’ Οƒ 𝑗=1 π‘œ 𝑀 π‘ˆ 𝛼 2 𝑔 𝑦 𝑀 = 𝑀 𝑗 𝑨 𝑗 2 ] β‰₯ 0, 1 π‘ˆ 𝑨 2 [ Οƒ 𝑗=1 Οƒ 𝑗=1 𝑨 𝑗 𝑀 𝑗 for all 𝑀 ∈ 𝑆 π‘œ (Cauchy-Schwarz inequality) Thus, 𝑔(𝑦) is a convex function Cauchy-Schwarz inequality : 𝑏 π‘ˆ 𝑏 𝑐 π‘ˆ 𝑐 β‰₯ 𝑏 π‘ˆ 𝑐 2 , 𝑏 𝑗 = 𝑨 𝑗 , 𝑐 𝑗 = 𝑀 𝑗 𝑨 𝑗 𝑏 π‘ˆ 𝑐 𝑏 π‘ˆ 𝑐 Proof 1: Let 𝑨 = 𝑏 βˆ’ 𝑐 π‘ˆ 𝑐 𝑐, or 𝑏 = 𝑨 + 𝑐 π‘ˆ 𝑐 𝑐 We have a π‘ˆ a = z T z + 𝑏 π‘ˆ 𝑐 2 𝑐 π‘ˆ 𝑐 2 𝑐 π‘ˆ 𝑐 β‰₯ 𝑏 π‘ˆ 𝑐 2 𝑐 π‘ˆ 𝑐 2 𝑐 π‘ˆ 𝑐 = 𝑏 π‘ˆ 𝑐 2 𝑐 π‘ˆ 𝑐 Proof 2: By induction 14

  15. 3. Operations that preserve convexity β€’ Nonnegative multiple: 𝛽𝑔, where 𝛽 β‰₯ 0, 𝑔 is convex β€’ Sum: 𝑔 1 + 𝑔 2 , where 𝑔 1 , π‘π‘œπ‘’ 𝑔 2 are convex β€’ Composition with affine function: 𝑔 𝐡𝑦 + 𝑐 , where 𝑔 is convex 2 𝑔 𝐡𝑦 + 𝑐 = 𝐡 π‘ˆ 𝛼 2 𝑔 𝑧|𝑧 = 𝐡𝑦 + 𝑐 𝐡 Proof: 𝛼 𝑦 𝑧 𝑛 log 𝑐 𝑗 βˆ’ 𝑏 𝑗 π‘ˆ 𝑦 𝑗 , E.g. 𝑔 𝑦 = βˆ’ Οƒ 𝑗=1 π‘ˆ 𝑦 < 𝑐 𝑗 , 𝑗 = 1, … , 𝑛} 𝑒𝑝𝑛 𝑔 = {𝑦|𝑏 𝑗 𝑔 𝑦 = 𝐡𝑦 + 𝑐 (if 𝑔 is twice differentiable) 15

Recommend


More recommend