10-701 Recita,on 4: Op,miza,on Dougal Sutherland - PowerPoint PPT Presentation

10-‑701 ¡Recita,on ¡4: ¡ Op,miza,on ¡ Dougal ¡Sutherland ¡ 10/8/2013 ¡ ¡

Mo,va,on ¡ • Much ¡of ¡the ¡,me ¡in ¡ML/stats, ¡we’re ¡finding ¡ the ¡best ¡model ¡to ¡fit ¡our ¡data ¡ – Best ¡discrimina,ve ¡ – Best ¡genera,ve ¡model: ¡MLE, ¡MAP, ¡… ¡ n X arg min ` ( x i ; M ) + penalty( M ) models M i =1 • How ¡we ¡do ¡that: ¡op,miza,on. ¡ • When ¡we ¡can: ¡convex ¡op,miza,on. ¡

Analy,c ¡minima ¡ • Set ¡gradient ¡to ¡zero ¡and ¡solve ¡ J λ ( β ) = 1 2 + 1 2 k X β � y k 2 2 λ k β k 2 2

Gradient ¡descent ¡ • Start ¡at ¡some ¡point, ¡follow ¡the ¡gradient ¡ towards ¡(a) ¡minimum ¡ ¡ x x 0 ¡ while termination conditions don’t hold do x x � η r f ( x ) end while

Gradient ¡descent ¡interpreta,on ¡ Approximate ¡the ¡func,on ¡with ¡a ¡quadra,c: ¡ + 1 f ( y ) ⇡ f ( x ) + r f ( x ) T ( y � x ) 2 η k y � x k 2 2 | {z } | {z } linear approximation to f proximity to x

Choosing ¡the ¡step ¡size ¡

Backtracking ¡ • Fix ¡a ¡backoff ¡parameter ¡0 ¡< ¡β ¡< ¡1 ¡ • At ¡each ¡itera,on: ¡ – Start ¡with ¡ η ¡= ¡1 ¡ f ( x � η r f ( x )) > f ( x ) � η – While ¡ 2 kr f ( x ) k 2 • Back ¡off ¡ ¡ η ¡= ¡β ¡η ¡

How ¡to ¡terminate ¡ • When ¡change ¡in ¡iterates ¡is ¡small ¡ • When ¡gradient ¡is ¡small ¡ • When ¡change ¡in ¡func,on ¡value ¡is ¡small ¡ • When ¡backtracking ¡step ¡size ¡gets ¡too ¡small ¡ • Aaer ¡a ¡fixed ¡,me/steps ¡budget ¡ • … ¡

Stochas,c ¡gradient ¡“descent” ¡ • Usually ¡we’re ¡minimizing ¡the ¡empirical ¡loss: ¡ ¡ 1 1 X X ` ( x i ; M ) r M ` ( x i ; M ) ¡ n n i i • We ¡do ¡this ¡to ¡approximate ¡the ¡expected ¡loss: ¡ ¡ ¡ E x [ r M ` ( x i ; M )] E x [ ` ( x ; M )] ¡ • But ¡we ¡can ¡also ¡use ¡rougher, ¡cheaper ¡approx.: ¡ ` ( x i ; M ) r M ` ( x i ; M )

SGD ¡ • “Online” ¡op,miza,on ¡ • Can ¡do ¡it ¡based ¡on ¡a ¡stream ¡of ¡samples ¡ – No ¡need ¡to ¡remember ¡old ¡ones, ¡then ¡ • Itera,ons ¡are ¡ much ¡ cheaper ¡ • Requires ¡more ¡itera,ons ¡ • One ¡big ¡problem: ¡not ¡a ¡descent ¡method! ¡

Minibatch ¡gradient ¡ • Like ¡SGD, ¡but ¡calculate ¡gradients ¡over ¡a ¡subset ¡ of ¡training ¡points ¡instead ¡of ¡just ¡one ¡ • Can ¡be ¡a ¡nice ¡medium ¡between ¡full ¡gradient ¡ descent ¡and ¡SGD ¡ – Not ¡a ¡descent ¡method, ¡but ¡“closer” ¡to ¡one ¡ – Itera,ons ¡more ¡expensive ¡than ¡SGD ¡ – Converges ¡faster ¡than ¡SGD ¡

Subgradients ¡ • When ¡your ¡op,miza,on ¡problem ¡is ¡convex ¡ but ¡not ¡differen,able ¡ • Subgradient ¡descent: ¡ – same ¡algorithm, ¡but ¡ use ¡any ¡subgradient ¡ instead ¡of ¡the ¡ gradient ¡ • This ¡is ¡slow. ¡

Generalized ¡gradient ¡descent ¡ • Objec,ve ¡is ¡the ¡sum ¡of ¡a ¡convex, ¡ differen,able ¡ g ¡and ¡a ¡convex ¡ h : ¡ min x g ( x ) + h ( x ) x prox η ( x � η r g ( x )) 1 2 η k x � z k 2 + h ( z ) prox η ( x ) = arg min z • e.g. ¡LASSO, ¡projected ¡gradient ¡descent ¡

Accelerated ¡gradient ¡method ¡ • At ¡each ¡step ¡ k : ¡ ¡ y x ( k − 1) + k � 2 ⇣ x ( k − 1) � x ( k − 2) ⌘ ¡ k + 1 ¡ x ( k ) prox η k ( y � η k r g ( y )) ¡ • y ¡term ¡carries ¡“momentum” ¡ • Provably ¡bejer ¡convergence ¡ – O(1/ k 2 ): ¡op,mal ¡for ¡first-‑order ¡

Newton’s ¡method ¡ • Gradient ¡descent ¡minimizes ¡ ¡ f ( y ) ⇡ f ( x ) + r f ( x ) T ( y � x ) + 1 2( y � x ) T 1 η I ( y � x ) ¡ • Newton’s ¡method: ¡quadra,c ¡approxima,on ¡ f ( y ) ⇡ f ( x ) + r f ( x ) T ( y � x ) + 1 ¡ 2( y � x ) T r 2 f ( x ) ( y � x ) ¡ • Takes ¡v. ¡few ¡itera,ons ¡for ¡v. ¡accurate ¡answer ¡ – Itera,ons ¡are ¡very ¡expensive ¡ – Diverges ¡with ¡bad ¡ini,aliza,on ¡ • Damped ¡Newton: ¡line ¡search, ¡trust ¡region ¡

Sort-‑of ¡second-‑order ¡methods ¡ • Quasi-‑Newton ¡methods ¡ – Approximate ¡Hessian ¡from ¡the ¡gradient ¡ – BFGS, ¡ L-‑BFGS ¡ ¡ • Truncated ¡Newton ¡ – Par,ally ¡op,mize ¡quadra,c ¡with ¡conjugate ¡gradient ¡

Standard ¡problem ¡forms ¡ • Linear ¡programs ¡(LPs) ¡ ¡ min c T x subject to Ax ≤ b, Ex = g ¡ • Quadra,c ¡programs ¡(QPs) ¡ min c T x + 1 ¡ 2 x T Hx subject to Ax ≤ b, Ex = g ¡ • Cone ¡programs ¡ min c T x subject to Ax + b ∈ K, x ∈ L

10-701 Recita,on 4: Op,miza,on Dougal Sutherland - PowerPoint PPT Presentation

10-701 Recita,on 4: Op,miza,on Dougal Sutherland 10/8/2013 Mo,va,on Much of the ,me in ML/stats, were finding the best model to fit our

Projects Recita)ons: Thursday 4:30pm 5:30pm, Annenberg 107

Recita)ons and office hours Office hours TAs: Tuesday

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Performance Op>miza>on Project 2 Lab Schedule

Program Opmizaon 15-213: Introduc;on to Computer Systems 10

Superop)miza)on CSE 501 Spring 15 1 Course Outline

Trajectory Op-miza-on for Mo-on Planning Pieter Abbeel UC

Maximizing Expected U=lity for Stochas=c Combinatorial Op=miza=on

Predic've Modeling in a Polyhedral Op'miza'on Space Eunjung

Op#miza#ons for Rendering Realis#c Lens flares in Polynomial

Op#miza#on Challenges for Deep Learning Yoshua Bengio U.

Bandit opmizaon with large strategy sets Alexandre Prou*ere

Op#miza#on for Locally Op#mal Control Pieter Abbeel UC

Op#miza#on of Block Sparse Matrix- Vector Mul#plica#on on Shared

Op#miza#on of High-Order Stencils* Kevin Stock

Op#miza#on of LLVM-Based Code using Mul#-Objec#ve Evolu#onary Algorithms Bernab Dorronsoro

Directory Services Status Update (including ARP Op:miza:on) Donald Eastlake, Li Yizhou (Huawei)

Announcements Survey about recita)on )mes will be posted today

RoboVis App Co nfig I K So lutio n whe re to g o fro m he re ? Alista ir Wic k

RoboVis Alista ir Wic k E vo Arm Sma ll ro b o t a rm 3 de g re e s o f fre e do m

Y P Transcranial Neuromodula.on O C T Dana H. Brooks O Northeastern

Introduc)on to GPU Programming Mubashir Adnan Qureshi

NSF SDC Lightning Round Tarek Abdelzaher Professor, UIUC

Enhanced Approach to Model Air Quality Impacts of Aircra8

10-701 Recita,on 4: Op,miza,on Dougal Sutherland - PowerPoint PPT Presentation

10-701 Recita,on 4: Op,miza,on Dougal Sutherland 10/8/2013 Mo,va,on Much of the ,me in ML/stats, were finding the best model to fit our

Projects Recita)ons: Thursday 4:30pm 5:30pm, Annenberg 107

Recita)ons and office hours Office hours TAs: Tuesday

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Performance Op&gt;miza&gt;on Project 2 Lab Schedule

Program Op*miza*on 15-213: Introduc;on to Computer Systems 10

Superop)miza)on CSE 501 Spring 15 1 Course Outline

Trajectory Op-miza-on for Mo-on Planning Pieter Abbeel UC

Maximizing Expected U=lity for Stochas=c Combinatorial Op=miza=on

Predic've Modeling in a Polyhedral Op'miza'on Space Eunjung

Op#miza#ons for Rendering Realis#c Lens flares in Polynomial

Op#miza#on Challenges for Deep Learning Yoshua Bengio U.

Bandit op*miza*on with large strategy sets Alexandre Prou*ere

Op#miza#on for Locally Op#mal Control Pieter Abbeel UC

Op#miza#on of Block Sparse Matrix- Vector Mul#plica#on on Shared

Op#miza#on of High-Order Stencils* Kevin Stock

Op#miza#on of LLVM-Based Code using Mul#-Objec#ve Evolu#onary Algorithms Bernab Dorronsoro

Directory Services Status Update (including ARP Op:miza:on) Donald Eastlake, Li Yizhou (Huawei)

Announcements Survey about recita)on )mes will be posted today

RoboVis App Co nfig I K So lutio n whe re to g o fro m he re ? Alista ir Wic k

RoboVis Alista ir Wic k E vo Arm Sma ll ro b o t a rm 3 de g re e s o f fre e do m

Y P Transcranial Neuromodula.on O C T Dana H. Brooks O Northeastern

Introduc)on to GPU Programming Mubashir Adnan Qureshi

NSF SDC Lightning Round Tarek Abdelzaher Professor, UIUC

Enhanced Approach to Model Air Quality Impacts of Aircra8

Performance Op>miza>on Project 2 Lab Schedule

Program Opmizaon 15-213: Introduc;on to Computer Systems 10

Bandit opmizaon with large strategy sets Alexandre Prou*ere