Composite Objective Mirror Descent John C. Duchi 1 , 3 Shai - PowerPoint PPT Presentation

Composite Objective Mirror Descent John C. Duchi 1 , 3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research 4 Toyota Technological Institute, Chicago June 29, 2010 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 1 / 22

Large scale logistic regression Problem: n huge, n � 1 min log(1 + exp( � a i , x � )) + λ � x � 1 n x i =1 � �� = f ( x ) “Usual” approach: online gradient descent (Zinkevich ’03). Let g t = ∇ log(1 + exp( � a t , x t � )) x t +1 = x t − η t g t − η t λ sign ( x t ) Then perform online to batch conversion Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 2 / 22

Problems with usual approach ◮ Regret bound/convergence rate: set G = max t � g t + λ sign ( x t ) � 2 � � x ∗ � 2 G � f ( x T ) + λ � x T � 1 = f ( x ∗ ) + λ � x ∗ � 1 + O √ T √ But G = Θ( d )—additional penalty of sign ( x t ) ◮ No sparsity in x T Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 3 / 22

Problems with usual approach ◮ Regret bound/convergence rate: set G = max t � g t + λ sign ( x t ) � 2 � � x ∗ � 2 G � f ( x T ) + λ � x T � 1 = f ( x ∗ ) + λ � x ∗ � 1 + O √ T √ But G = Θ( d )—additional penalty of sign ( x t ) ◮ No sparsity in x T ◮ Why should we suffer from �·� 1 term? Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 3 / 22

Online Gradient Descent Let g t = ∇ log(1 + exp( � a t , x t � )) + λ sign ( x t ). OGD step (Zinkevich ’03): � � η � g t , x � + 1 2 � x − x t � 2 x t +1 = x t − η g t = argmin 2 x f ( x ) + λ � x � 1 f ( x t ) + � g t , x - x t � Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 4 / 22

Online Gradient Descent Let g t = ∇ log(1 + exp( � a t , x t � )) + λ sign ( x t ). OGD step (Zinkevich ’03): � � η � g t , x � + 1 2 � x − x t � 2 x t +1 = x t − η g t = argmin 2 x � g t , x � + B ψ ( x, x t ) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 4 / 22

Problems with Subgradient Methods ◮ Subgradients are non-informative at singularities Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 5 / 22

Composite Objective Approach Let g t = ∇ log(1 + exp( � a t , x t � )). Truncated gradient (Langford et al. ’08, Duchi & Singer ’09): � 1 � 2 � x − x t � 2 + η � g t , x � + ηλ � x � 1 x t +1 = argmin x = sign ( x t − η g t ) ⊙ [ | x t − η g t | − ηλ ] + x t - ηg t [ | x t - ηg t | - ηλ ] + Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 6 / 22

Composite Objective Approach Update is x t +1 = sign ( x t − η g t ) ⊙ [ | x t − η g t | − ηλ ] + Two nice things: ◮ Sparsity from [ · ] + ◮ Convergence rate: let G = max t � g t � 2 � � x ∗ � 2 G � f ( x T ) + λ � x T � 1 = f ( x ∗ ) + λ � x ∗ � 1 + O √ T No extra penalty from λ � x � 1 ! Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 7 / 22

Abstraction to Regularized Online Convex Optimization Repeat: ◮ Learner plays point x t ◮ Receive f t + ϕ ( ϕ known) ◮ Suffer loss f t ( x t ) + ϕ ( x t ) Goal: attain small regret T T � � R ( T ) := f t ( x t ) + ϕ ( x t ) − inf f t ( x ) + ϕ ( x ) x ∈X t =1 t =1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 8 / 22

Composite Objective MIrror Descent Let g t = ∇ f t ( x t ). Comid step: x t +1 = argmin { B ψ ( x , x t ) + η � g t , x � + ηϕ ( x ) } x ∈X f ( x ) f ( x ) + ϕ ( x ) ϕ ( x ) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 9 / 22

Composite Objective MIrror Descent Let g t = ∇ f t ( x t ). Comid step: x t +1 = argmin { B ψ ( x , x t ) + η � g t , x � + ηϕ ( x ) } x ∈X B ψ ( x, x t ) + � g, x � + ϕ ( x ) f ( x ) + ϕ ( x ) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 9 / 22

Convergence Results Old (online gradient/mirror descent): Theorem: For any x ∗ ∈ X , � T f t ( x t ) + ϕ ( x t ) − f t ( x ∗ ) − ϕ ( x ∗ ) t =1 T � ≤ 1 η B ψ ( x ∗ , x 1 ) + η �∇ f t ( x t ) + ∇ ϕ ( x t ) � 2 ∗ 2 t =1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 10 / 22

Convergence Results Old (online gradient/mirror descent): Theorem: For any x ∗ ∈ X , � T f t ( x t ) + ϕ ( x t ) − f t ( x ∗ ) − ϕ ( x ∗ ) t =1 T � ≤ 1 η B ψ ( x ∗ , x 1 ) + η �∇ f t ( x t ) + ∇ ϕ ( x t ) � 2 ∗ 2 t =1 New ( Comid ): Theorem: For any x ∗ ∈ X , T T � � f t ( x t ) + ϕ ( x t ) − f t ( x ∗ ) − ϕ ( x ∗ ) ≤ 1 η B ψ ( x ∗ , x 1 ) + η �∇ f t ( x t ) � 2 ∗ 2 t =1 t =1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 10 / 22

Derived Algorithms ◮ FOBOS (Duchi & Singer, 2009) ◮ p -norm divergences ◮ Mixed-norm regularization ◮ Matrix Comid Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 11 / 22

p -norms Better ℓ 1 algorithms: ϕ ( x ) = λ � x � 1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 12 / 22

p -norms Better ℓ 1 algorithms: ϕ ( x ) = λ � x � 1 ◮ Idea: non-Euclidean geometry (e.g. dense gradients, sparse x ∗ ) p is strongly convex over R d w.r.t. ℓ p , 2( p − 1) � x � 2 1 ◮ Recall 1 < p ≤ 2 2 � x � 2 ◮ Take ψ ( x ) = 1 p Corollary: When � f ′ t ( x t ) � ∞ ≤ G ∞ , take p = 1 + 1 / log d to get � � � � x ∗ � 1 G ∞ R ( T ) = O T log d Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 12 / 22

Derived p -norm algorithms SMIDAS (Shalev-Shwartz & Tewari 2009): take ϕ ( x ) = λ � x � 1 . Assume sign ([ ∇ ψ ( x )] j ) = sign ( x j ), define S λ ( z ) = sign ( z ) · [ | z | − λ ] + Then x t +1 = ( ∇ ψ ) − 1 S ηλ ( ∇ ψ ( x t ) − η f ′ t ( x t )) λη t 0 x t +1 x t +1 / 2 0 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 13 / 22

Comid with mixed norms � d ϕ ( X ) = � X � ℓ 1 /ℓ q = � x j � q j =1   � x 1 � q x 1   x 2 � x 2 � q   X = ⇒  .  . . .   . . x d � x d � q ◮ Separable and solvable using previous methods ◮ Multitask and multiclass learning ◮ x j associated with feature j ◮ Penalize x j once Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 14 / 22

Mixed-norm p -norm algorithms Specialize problem to x � v , x � + 1 2 � x � 2 min p + λ � x � ∞ ◮ Closed form? No. Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 15 / 22

Mixed-norm p -norm algorithms Specialize problem to x � v , x � + 1 2 � x � 2 min p + λ � x � ∞ ◮ Closed form? No. ◮ Dual problem ( x ∗ = v − β ): min β � v − β � q subject to � β � 1 ≤ λ Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 15 / 22

Mixed-norm p -norm algorithms Problem: min β � v − β � q subject to � β � 1 ≤ λ Observation: Monotonicity of β , so v i ≥ v j implies β i ≥ β j Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 16 / 22

Mixed-norm p -norm algorithms Problem: min β � v − β � q subject to � β � 1 ≤ λ Observation: Monotonicity of β , so v i ≥ v j implies β i ≥ β j Root-finding problem: β 6 ( θ ) d d � � � v i − θ 1 / ( q − 1) � λ = β i ( θ ) = + i =1 i =1 v 4 v 6 v 8 Solve with median-like search Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 16 / 22

Matrix Comid Idea: get sparsity in spectrum of X ∈ R d 1 × d 2 . Take min { d 1 , d 2 } � ϕ ( X ) = | | | X | | | 1 = σ i ( X ) i =1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 17 / 22

Matrix Comid Idea: get sparsity in spectrum of X ∈ R d 1 × d 2 . Take min { d 1 , d 2 } � ϕ ( X ) = | | | X | | | 1 = σ i ( X ) i =1 Schatten p -norms: apply p -norms to columns of X ∈ R d 1 × d 2 � min { d 1 , d 2 } � 1 / p � σ i ( X ) p | | | X | | | p = � σ ( X ) � p = i =1 Important fact: for 1 < p ≤ 2, 1 | 2 ψ ( X ) = 2( p − 1) | | | X | | p is strongly convex w.r.t. | | |·| | | p (Ball et al., 1994) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, 2010 17 / 22

Composite Objective Mirror Descent John C. Duchi 1 , 3 Shai - PowerPoint PPT Presentation

Composite Objective Mirror Descent John C. Duchi 1 , 3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research 4 Toyota Technological Institute, Chicago

SMART MIRROR AMITY INSTITUTE OF TELECOM ENGINEERING AND MANAGEMENT WHAT IS SMART MIRROR A

Mirror Presentation Service Group https://www.indiamart.com/mirror-presentation-service/ Mirror

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

MIRROR TRADING I N T E R N A T I O N A L business opportunity presentation MIRROR TRADING I N

A proposal for (0 , 2) mirror symmetry of toric varieties Wei Gu Based on Wei Gu , Eric Sharpe

Quiver Representations and Theta Functions Man Wai Cheung Motivation from mirror symmetry

Q34.4 A spherical mirror produces an enlarged upright image of A spherical mirror produces an

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

MIRROR | RESEARCH FINDINGS MIRROR | EXECUTIVE SUMMARY In todays fast-paced digital world,

Mirror Lake Management District Mirror Lake Management Planning Project Wrap-Up Meeting June

Mirror Mirror on the Wall Is Facial Recognition the Best Authenticator of All? Biometrics in

3 3 3 3 3 3 3 3 High Reflectance High Reflectance Visible Mirror Visible Mirror (HRVM)

Mirror, mirror, on the wall, who in this land is fairest of all? Revisiting the extended

North Lake Mirror Redevelopment E LAKELAND ECONOMIC DEVELOPMENT COUNCIL 2 SLID North Lake

Welcome Brian Aitken Trinity Mirror Background Brian Aitken Trinity Mirror The North Easts

A Vision for DAws Hill PRESENTATION TO THE DAWS HILL AREA REFERENCE GROUP 9tH JAnuAry 201 2 A

Berkeley International Study Program (BISP) STUDY ABROAD OPPORTUNITY AT THE UNIVERSITY OF

UC UC Ber erkeley keley Academic Talent Development Program G LOBAL P ROGRAMS July 11

Lets Grow Vermont Tour Solving the Workforce Gap Organization: Addison County Chamber of

SUBSEQUENT PLANS Housing Element 2010, update in process Southside Plan 2011 Downtown Area Project

Merrian Fuller, UC Berkeley Blair Hamilton, Efficiency Vermont & VEIC Residential EE

Driving High-Performance, Sustainable New Construction Through Campus Policy sbl.lbl.gov John

iLove my iRa iPad Ira & Carol Serkes, Berkeley CA Monday, November 14, 2011 Which T o Buy?

Composite Objective Mirror Descent John C. Duchi 1 , 3 Shai - PowerPoint PPT Presentation

Composite Objective Mirror Descent John C. Duchi 1 , 3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research 4 Toyota Technological Institute, Chicago

SMART MIRROR AMITY INSTITUTE OF TELECOM ENGINEERING AND MANAGEMENT WHAT IS SMART MIRROR A

Mirror Presentation Service Group https://www.indiamart.com/mirror-presentation-service/ Mirror

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

MIRROR TRADING I N T E R N A T I O N A L business opportunity presentation MIRROR TRADING I N

A proposal for (0 , 2) mirror symmetry of toric varieties Wei Gu Based on Wei Gu , Eric Sharpe

Quiver Representations and Theta Functions Man Wai Cheung Motivation from mirror symmetry

Q34.4 A spherical mirror produces an enlarged upright image of A spherical mirror produces an

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

MIRROR | RESEARCH FINDINGS MIRROR | EXECUTIVE SUMMARY In todays fast-paced digital world,

Mirror Lake Management District Mirror Lake Management Planning Project Wrap-Up Meeting June

Mirror Mirror on the Wall Is Facial Recognition the Best Authenticator of All? Biometrics in

3 3 3 3 3 3 3 3 High Reflectance High Reflectance Visible Mirror Visible Mirror (HRVM)

Mirror, mirror, on the wall, who in this land is fairest of all? Revisiting the extended

North Lake Mirror Redevelopment E LAKELAND ECONOMIC DEVELOPMENT COUNCIL 2 SLID North Lake

Welcome Brian Aitken Trinity Mirror Background Brian Aitken Trinity Mirror The North Easts

A Vision for DAws Hill PRESENTATION TO THE DAWS HILL AREA REFERENCE GROUP 9tH JAnuAry 201 2 A

Berkeley International Study Program (BISP) STUDY ABROAD OPPORTUNITY AT THE UNIVERSITY OF

UC UC Ber erkeley keley Academic Talent Development Program G LOBAL P ROGRAMS July 11

Lets Grow Vermont Tour Solving the Workforce Gap Organization: Addison County Chamber of

SUBSEQUENT PLANS Housing Element 2010, update in process Southside Plan 2011 Downtown Area Project

Merrian Fuller, UC Berkeley Blair Hamilton, Efficiency Vermont &amp; VEIC Residential EE

Driving High-Performance, Sustainable New Construction Through Campus Policy sbl.lbl.gov John

iLove my iRa iPad Ira &amp; Carol Serkes, Berkeley CA Monday, November 14, 2011 Which T o Buy?

Merrian Fuller, UC Berkeley Blair Hamilton, Efficiency Vermont & VEIC Residential EE

iLove my iRa iPad Ira & Carol Serkes, Berkeley CA Monday, November 14, 2011 Which T o Buy?