Summary Key topics. ◮ Familiarity with form of basic network gradient. ◮ Deep network initialization. ◮ Minibatches. ◮ Momentum. Next time: convexity. 17 / 42
Part 2: convexity
Why convexity? Deep networks are not convex in their parameters. Why study convexity? ◮ Convexity is pervasive in ML and mathematics; e.g., our losses for deep learning are still convex. ◮ Convexity exemplifies nice “local-to-global” structure. 18 / 42
6. Convex sets and functions
Convex sets A set S is convex if, for every pair of points { x , x ′ } in S , the line segment between x and x ′ is also contained in S . ( { x , x ′ } ∈ S = ⇒ [ x , x ′ ] ∈ S .) convex not convex convex convex 19 / 42
Convex sets A set S is convex if, for every pair of points { x , x ′ } in S , the line segment between x and x ′ is also contained in S . ( { x , x ′ } ∈ S = ⇒ [ x , x ′ ] ∈ S .) convex not convex convex convex Examples : ◮ All of R d . ◮ Empty set. ◮ Half-spaces: { x ∈ R d : a T x ≤ b } . ◮ Intersections of convex sets. � � � � = � m x ∈ R d : Ax ≤ b x ∈ R d : a T ◮ Polyhedra: i x ≤ b i . i =1 ◮ Convex hulls: conv( S ) := { � k i =1 α i x i : k ∈ N , x i ∈ S, α i ≥ 0 , � k i =1 α i = 1 } . (Infinite convex hulls: intersection of all convex supersets.) 19 / 42
Convex functions from convex sets The epigraph of a function f is the area above the curve: � � ( x , y ) ∈ R d +1 : y ≥ f ( x ) epi( f ) := . A function is convex if its epigraph is convex. f is not convex f is convex 20 / 42
Convex functions (standard definition) A function f : R d → R is convex if for any x , x ′ ∈ R d and α ∈ [0 , 1] , f ((1 − α ) x + α x ′ ) ≤ (1 − α ) · f ( x ) + α · f ( x ′ ) . x x ′ x x ′ f is not convex f is convex 21 / 42
Convex functions (standard definition) A function f : R d → R is convex if for any x , x ′ ∈ R d and α ∈ [0 , 1] , f ((1 − α ) x + α x ′ ) ≤ (1 − α ) · f ( x ) + α · f ( x ′ ) . x x ′ x x ′ f is not convex f is convex Examples : ◮ f ( x ) = c x for any c > 0 (on R ) ◮ f ( x ) = | x | c for any c ≥ 1 (on R ) ◮ f ( x ) = b T x for any b ∈ R d . ◮ f ( x ) = � x � for any norm �·� . ◮ f ( x ) = x T Ax for symmetric positive semidefinite A . �� d � ◮ f ( x ) = ln i =1 exp( x i ) , which approximates max i x i . 21 / 42
Example verification: norms Is f ( x ) = � x � convex? 22 / 42
Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . 22 / 42
Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = 22 / 42
Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) 22 / 42
Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) (1 − α ) � x � + α � x ′ � = (homogeneity) 22 / 42
Example verification: norms Is f ( x ) = � x � convex? Pick any α ∈ [0 , 1] and any x , x ′ ∈ R d . f ((1 − α ) x + α x ′ ) � (1 − α ) x + α x ′ � = � (1 − α ) x � + � α x ′ � ≤ (triangle inequality) (1 − α ) � x � + α � x ′ � = (homogeneity) (1 − α ) f ( x ) + αf ( x ′ ) . = Yes, f is convex. 22 / 42
Operations preserving convexity Summations: if ( f 1 , . . . , f k ) convex and ( α 1 , . . . , α k ) nonnegative, x �→ α 1 f 1 ( x ) + · · · + α k f k ( x ) is convex. Affine composition: if f is convex, the for any A ∈ R m × d and b ∈ R m , x �→ f ( Ax + b ) is convex. Maxima: if ( f 1 , . . . , f k ) are convex, x �→ max f i ( x ) is convex. i (Infinitely many functions: use a supremum.) 23 / 42
Example: linear classification and margin losses If ℓ is convex and the predictor is linear, then the empirical risk is convex: ◮ Define ℓ i ( w ) = ℓ ( w T x i y i ) , convex since composition of convex and affine; ◮ thus the empirical risk � n � n R ( w ) = 1 T x i y i ) = 1 � ℓ ( w ℓ i ( w ) n n i =1 i =1 is the nonnegative combination of convex functions, and convex. 24 / 42
7. Various forms of convexity
Convexity of differentiable functions Differentiable functions If f : R d → R is differentiable, then f is convex if and only if f ( x ) a ( x ) T ( x − x 0 ) f ( x ) ≥ f ( x 0 ) + ∇ f ( x 0 ) for all x , x 0 ∈ R d . x 0 Note: this implies increasing slopes : � � T ( x − y ) ≥ 0 . a ( x ) = f ( x 0 ) + f ′ ( x 0 )( x − x 0 ) ∇ f ( x ) − ∇ f ( y ) 25 / 42
Convexity of differentiable functions Differentiable functions If f : R d → R is differentiable, then f is convex if and only if f ( x ) a ( x ) T ( x − x 0 ) f ( x ) ≥ f ( x 0 ) + ∇ f ( x 0 ) for all x , x 0 ∈ R d . x 0 Note: this implies increasing slopes : � � T ( x − y ) ≥ 0 . a ( x ) = f ( x 0 ) + f ′ ( x 0 )( x − x 0 ) ∇ f ( x ) − ∇ f ( y ) Twice-differentiable functions If f : R d → R is twice-differentiable, then f is convex if and only if ∇ 2 f ( x ) � 0 for all x ∈ R d (i.e., the Hessian, or matrix of second-derivatives, is positive semi-definite for all x ). 25 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� � e � a , x − x 0 � − = 1 + � a , x − x 0 � 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� � e � a , x − x 0 � − = 1 + � a , x − x 0 � (because 1 + z ≤ e z for all z ∈ R ) . ≥ 0 26 / 42
Verifying convexity of differentiable functions Is f ( x ) = x 4 convex? Use second-order condition for convexity. ∂ 4 x 3 ∂x f ( x ) = 12 x 2 ≥ 0 . ∂ 2 ∂x 2 f ( x ) = Yes, f is convex. Is f ( x ) = e � a , x � convex? Use first-order condition for convexity. ∇ f ( x ) = e � a , x � ∇ {� a , x �} = e � a , x � a (chain rule) . Difference between f and its affine approximation: � � � � e � a , x � − e � a , x 0 � + e � a , x 0 � � a , x − x 0 � f ( x ) − f ( x 0 ) + �∇ f ( x 0 ) , x − x 0 � = e � a , x 0 � � �� � e � a , x − x 0 � − = 1 + � a , x − x 0 � (because 1 + z ≤ e z for all z ∈ R ) . ≥ 0 Yes, f is convex. 26 / 42
Recommend
More recommend