Accelerating Optimization via Adaptive Prediction Mehryar Mohri 1 - PowerPoint PPT Presentation

May 08, 2023 •592 likes •695 views

Accelerating Optimization via Adaptive Prediction Mehryar Mohri 1 Scott Yang 2 1 Google, New York University 2 New York University NIPS Easy Data II, Dec 10, 2015 Scott Yang Accelerating Optimization Learning Scenario and Set-Up Online Convex

Accelerating Optimization via Adaptive Prediction Mehryar Mohri 1 Scott Yang 2 1 Google, New York University 2 New York University NIPS Easy Data II, Dec 10, 2015 Scott Yang Accelerating Optimization
Learning Scenario and Set-Up Online Convex Optimization Sequential optimization problem K ⊂ R n compact action space, f t convex loss functions At time t , learner chooses action x t , receives loss function f t , and suffers loss f t ( x t ) Goal: minimize regret T � max f t ( x t ) − f t ( x ) x ∈K t =1 Scott Yang Accelerating Optimization
Worst-case vs Data-dependent Methods Worst-case methods: 1 Algorithms: Mirror Descent, FTRL √ 2 Regret bounds typically of the form O ( T ) 3 Algorithms do not give faster rates on “easy data” Data-dependent methods: 1 Adaptive regularization [Duchi et al 2010] Easy data: sparsity 2 Predictable sequences [Rakhlin and Sridharan 2012] Easy data: slowly-varying gradients Scott Yang Accelerating Optimization
Adaptive Regularization AdaGrad algorithm of [Duchi et al 2010] (+ many others): 1 Standard Mirror Descent: x t +1 = argmin x ∈K g t · x + B ψ ( x, x t ). 2 Adaptivity: change the regularizer at each time step ψ − → ψ t . 3 Worst-case optimal data-dependent bound: �� n � �� T t =1 | g t,i | 2 O i =1 4 Easy data scenario: sparsity Scott Yang Accelerating Optimization
Predictable Sequences Optimistic FTRL algorithm of [Rakhlin and Sridharan 2012] Idea: Learner should try to “predict” the next gradient M t ( g 1 , . . . , g t − 1 ) ≈ g t . Consequences: �� T � t =1 | g t − M t | 2 Typical regret bound O . 2 Often still worst-case optimal Easy data scenario: slowly varying gradients Scott Yang Accelerating Optimization
Adaptive Predictions Motivation: Adaptive regularization good for sparsity Predictable sequences good for slowly varying gradients Questions: Can we combine both and get the best of both worlds? What are the easy data scenarios for such an algorithm? Scott Yang Accelerating Optimization
Adaptive Predictions Idea: Derive an adaptive norm bound for optimistic FTRL: �� T � O t =1 | g t − M t | ( t ) , ∗ Find “best” norm associated to gradient prediction error instead of gradient losses. Consequences: Can view AdaGrad as special case of naively predicting zero Can view Optimistic FTRL as naive regularization Behaves well under sparsity Accelerates faster than Optimistic FTRL when predictions vary in per-coordinate accuracy Scott Yang Accelerating Optimization
Practical Considerations Extensions: Composite terms Proximal versus non-proximal regularization Large-scale optimization problems: epoch-based variants For more details, please stop by the poster. Thank you! Scott Yang Accelerating Optimization

Recommend

The Use of Prediction for The Use of Prediction for Accelerating Upgrade Misses in Accelerating

The Use of Prediction for The Use of Prediction for Accelerating Upgrade Misses in Accelerating Upgrade Misses in cc-NUMA Multiprocessors cc-NUMA Multiprocessors Manuel E. Acacio , Jos Gonzlez

671 views • 24 slides

Accelerating PDE-Constrained Optimization Problems using Adaptive Reduced-Order Models Matthew J.

Introduction Optimization via Adaptive Model Reduction Large-Scale, Constrained Optimization Conclusion Accelerating PDE-Constrained Optimization Problems using Adaptive Reduced-Order Models Matthew J. Zahr Advisor: Charbel Farhat

1.24k views • 85 slides

Accelerating PDE-Constrained Optimization Problems using Adaptive Reduced-Order Models Matthew J.

1.08k views • 90 slides

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Neural Nets for Adaptive Filter and Adaptive Pattern Recognition Brian Young Article Context Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition Adaptive Filters Min. Disturb. and LMS

976 views • 23 slides

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano, MSaad, Karimi Adaptive Control A set of

640 views • 45 slides

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 11: Direct Adaptive Control 2 Adaptive Control Landau, Lozano, MSaad, Karimi Adaptive Control A Basic Scheme

457 views • 24 slides

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 12 Indirect Adaptive Control 2 Adaptive Control Landau, Lozano, MSaad, Karimi Pole placement The pole placement

585 views • 25 slides

Accelerating PDE-Constrained Optimization using Adaptive Reduced-Order Models: Application to

Motivation ROM-Constrained Optimization Numerical Experiments Conclusion Accelerating PDE-Constrained Optimization using Adaptive Reduced-Order Models: Application to Topology Optimization Matthew J. Zahr Farhat Research Group Stanford

942 views • 72 slides

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel adaptive control with switching Outline 1. Introduction 2. Multimodel Adaptive Control with Switching 3. Stability of the Adaptive System 4.

558 views • 38 slides

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1 Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 14: Adaptive regulation Rejection of unknown disturbances 2 Adaptive Control

500 views • 38 slides

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction Our goal today To define a Structure and Structured Prediction 1 What are structures? 2 Examples of structured data? 3 Examples of structured

661 views • 34 slides

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala virtutech virtu

377 views • 26 slides

Accelerating PDE-Constrained Optimization using Adaptive Reduced-Order Models Matthew J. Zahr

Motivation ROM-Constrained Optimization Numerical Experiments Extensions Conclusion References Accelerating PDE-Constrained Optimization using Adaptive Reduced-Order Models Matthew J. Zahr Institute for Computational and Mathematical

687 views • 49 slides

A Nonlinear Trust-Region Framework for PDE-Constrained Optimization Using Adaptive Model

Introduction Optimization via Adaptive Model Reduction Large-Scale, Constrained Optimization Conclusion A Nonlinear Trust-Region Framework for PDE-Constrained Optimization Using Adaptive Model Reduction Matthew J. Zahr Institute for

336 views • 33 slides

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization Types of optimization problems Unconstrained optimization Constrained optimization Practical optimization 2 Outline Introduction to optimization

603 views • 59 slides

Link prediction via matrix factorization Charles Elkan University of California, San Diego

Link prediction via matrix factorization Charles Elkan University of California, San Diego September 6, 2011 1 / 26 Outline Introduction: Three related prediction tasks 1 Link prediction in networks 2 Discussion 3 2 / 26 Link prediction

436 views • 27 slides

Privacy, AllJoyn, IoT: Why proximal networks are better JAMES KANE Co-Founder, Two Bulls 24

Privacy, AllJoyn, IoT: Why proximal networks are better JAMES KANE Co-Founder, Two Bulls 24 September 2014 AllSeen Alliance 1 Privacy concerns the information that we allow people to access and how they are allowed to use it. Security

481 views • 37 slides

COMMUNICATIVE LANGUAGE TEACHING TO SUPPORT PROJECT-BASED LANGUAGE LEARNING IN HERITAGE LANGUAGES

COMMUNICATIVE LANGUAGE TEACHING TO SUPPORT PROJECT-BASED LANGUAGE LEARNING IN HERITAGE LANGUAGES Anke al-Bataineh, PhD DEMO OF This is a snippet of a course I have created COMMUNICATIVE called Lifejacket, which prepares LANGUAGE

407 views • 13 slides

Random walk on the torus Jean-Baptiste Boyer (IMB / ModalX) May 16, 2016 Jean-Baptiste Boyer

Random walk on the torus Jean-Baptiste Boyer (IMB / ModalX) May 16, 2016 Jean-Baptiste Boyer (IMB / ModalX) Random walk on the torus May 16, 2016 1 / 18 Products of random matrices 1 Linear random walks on the torus 2 What about the

300 views • 25 slides

More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may

More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be diferentiable everywhere. How does one compute their subgradients at points of non-diferentiability? Infmum: If c ( x, y ) is convex in ( x, y

477 views • 25 slides

The Control Basis: An Action Architecture for Computational Development Laboratory for

The Control Basis: An Action Architecture for Computational Development Laboratory for Perceptual Robotics College of Information and Computer Sciences 2 The Control Basis In place of a relatively small set of special purpose

334 views • 12 slides

Server-side browsing considered harmful 06/19/2015 Nicolas Grgoire Agarri Offensive security

Agarri Offensive security Server-side browsing considered harmful 06/19/2015 Nicolas Grgoire Agarri Offensive security Context Vectors Targets Blacklists Bugs Toolbox 06/19/2015 Nicolas Grgoire Agarri Offensive security Context

959 views • 73 slides

12. Filas W. Celes e J. L. Rangel Outra estrutura de dados bastante usada em computao a

12. Filas W. Celes e J. L. Rangel Outra estrutura de dados bastante usada em computao a fila . Na estrutura de fila, os acessos aos elementos tambm seguem uma regra. O que diferencia a fila da pilha a ordem de sada dos elementos:

396 views • 10 slides

da UNESCO Uma leitura portuguesa www.mpatraoneves.pt www.mpatraoneves.pt www.mpatraoneves.pt

www.mpatraoneves.pt www.mpatraoneves.pt A Declarao Universal de Biotica www.mpatraoneves.pt www.mpatraoneves.pt da UNESCO Uma leitura portuguesa www.mpatraoneves.pt www.mpatraoneves.pt www.mpatraoneves.pt www.mpatraoneves.pt Por: M.

346 views • 24 slides