On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* - PowerPoint PPT Presentation

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I. Jordan 1 1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk

Overview Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points : local max, saddle points, local min. 2 / 6 Chi Jin On the Local Minima of the Empirical Risk

Overview Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points : local max, saddle points, local min. ◮ Perturbed GD [ Jin et al. 2017] efficiently escapes local max and saddle points. 2 / 6 Chi Jin On the Local Minima of the Empirical Risk

Overview Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points : local max, saddle points, local min. ◮ Perturbed GD [ Jin et al. 2017] efficiently escapes local max and saddle points. ◮ How to deal with spurious local min? 2 / 6 Chi Jin On the Local Minima of the Empirical Risk

Local Minima In general, finding global minima is NP-hard . 3 / 6 Chi Jin On the Local Minima of the Empirical Risk

Local Minima In general, finding global minima is NP-hard . f Avoiding “shallow” local minima Goal: finds approximate local minima of smooth nonconvex function F , given only access to an errorneous version f where sup x | F ( x ) − f ( x ) | ≤ ν 3 / 6 Chi Jin On the Local Minima of the Empirical Risk

Application Statistical Learning. Minimize population risk R while only have access to emprical risk ˆ R n . n R n ( θ ) = 1 ˆ � R ( θ ) = E z ∼D [ L ( θ ; z )] , L ( θ ; z i ) . n i =1 4 / 6 Chi Jin On the Local Minima of the Empirical Risk

Application Statistical Learning. Minimize population risk R while only have access to emprical risk ˆ R n . n R n ( θ ) = 1 ˆ � R ( θ ) = E z ∼D [ L ( θ ; z )] , L ( θ ; z i ) . n i =1 R n ( θ ) | ≤ O (1 / √ n ). Unifrom convergence guarantees sup θ | R ( θ ) − ˆ 4 / 6 Chi Jin On the Local Minima of the Empirical Risk

Results f Goal: find ǫ -approximate local minima of F in polynomial time. Central Questions: 1. What algorithm can achieve this ? 2. How much error ν can be tolerated ? 5 / 6 Chi Jin On the Local Minima of the Empirical Risk

Results f Goal: find ǫ -approximate local minima of F in polynomial time. Central Questions: 1. What algorithm can achieve this ? 2. How much error ν can be tolerated ? Zhang et al. [2017]: Stochastic Gradient Langevin Dynamics (SGLD) if ν ≤ ǫ 2 / d 8 . 5 / 6 Chi Jin On the Local Minima of the Empirical Risk

Results f Goal: find ǫ -approximate local minima of F in polynomial time. Central Questions: 1. What algorithm can achieve this ? 2. How much error ν can be tolerated ? Zhang et al. [2017]: Stochastic Gradient Langevin Dynamics (SGLD) if ν ≤ ǫ 2 / d 8 . This Work: Perturbed SGD on a “smoothed” version of f if ν ≤ ǫ 1 . 5 / d . 5 / 6 Chi Jin On the Local Minima of the Empirical Risk

Almost Sharp Guarantees Is there better polynomial time algorithms that tolerate larger error? 6 / 6 Chi Jin On the Local Minima of the Empirical Risk

Almost Sharp Guarantees Is there better polynomial time algorithms that tolerate larger error? No! Complete characterization of error ν vs accuracy ǫ and dimension d . 6 / 6 Chi Jin On the Local Minima of the Empirical Risk

Almost Sharp Guarantees Is there better polynomial time algorithms that tolerate larger error? No! Complete characterization of error ν vs accuracy ǫ and dimension d . Poster: Wed 5-7 PM, #43. Thanks! 6 / 6 Chi Jin On the Local Minima of the Empirical Risk

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* - PowerPoint PPT Presentation

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I. Jordan 1 1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk Overview Nonconvex

Aeronautical Federal Aviation Administration Charting Forum 12-01 CATEGORY III CHART MINIMA

Optimization why does it work How many minima Do they control worm complexity Plain

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

JUST THE MATHS SLIDES NUMBER 11.2 DIFFERENTIATION APPLICATIONS 2 (Local maxima and local

Gradient Descent Finds Global Minima of Deep Neural Networks Simon S. Du, Jason D. Lee, Haochuan

Finding Maxima and Minima For a function of two variables what does a relative maximum or relative

Introduction to Data Science: Neural [ 1 , 2 , , p ] g x w h m g h g f w old M

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

Tabu Search Key idea: Use aspects of search history (memory) to escape from local minima. Simple

No Spurious Local Minima in Training Deep Quadratic Networks Abbas Kazemipour Conference on

The shapes of level curves of real polynomials near strict local minima Miruna-tefana Sorea

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

Empirical Project Monitor and Results from 100 OSS Development Projects Masao Ohira Empirical

Lecture Series - MSG 141 C2-Simula5on Interoperability

Special Issues in SNFs/NFs during the COVID-19 Pandemic Alice Bonner, I HI Senior Advisor for

Evidence-based Health Promotion into the Workplace Jeff Harris, MD MPH MBA Overview Why do

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

Introduction to Machine Learning CART: Splitting Criteria compstat-lmu.github.io/lecture_i2ml

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* - PowerPoint PPT Presentation

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I. Jordan 1 1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk Overview Nonconvex

Aeronautical Federal Aviation Administration Charting Forum 12-01 CATEGORY III CHART MINIMA

Optimization why does it work How many minima Do they control worm complexity Plain

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

JUST THE MATHS SLIDES NUMBER 11.2 DIFFERENTIATION APPLICATIONS 2 (Local maxima and local

Gradient Descent Finds Global Minima of Deep Neural Networks Simon S. Du, Jason D. Lee, Haochuan

Finding Maxima and Minima For a function of two variables what does a relative maximum or relative

Introduction to Data Science: Neural [ 1 , 2 , , p ] g x w h m g h g f w old M

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

Tabu Search Key idea: Use aspects of search history (memory) to escape from local minima. Simple

No Spurious Local Minima in Training Deep Quadratic Networks Abbas Kazemipour Conference on

The shapes of level curves of real polynomials near strict local minima Miruna-tefana Sorea

CSC2412: Private Gradient Descent &amp; Empirical Risk Minimization Sasho Nikolov 1 Empirical

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

Empirical Project Monitor and Results from 100 OSS Development Projects Masao Ohira Empirical

Lecture Series - MSG 141 C2-Simula5on Interoperability

Special Issues in SNFs/NFs during the COVID-19 Pandemic Alice Bonner, I HI Senior Advisor for

Evidence-based Health Promotion into the Workplace Jeff Harris, MD MPH MBA Overview Why do

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

Introduction to Machine Learning CART: Splitting Criteria compstat-lmu.github.io/lecture_i2ml

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical