min L ( ) - GANs: Hard (different) optimization problem: minimax. - PowerPoint PPT Presentation

SVRE: NEW METHOD FOR TRAINING GAN S G AUTHIER G IDEL Mila, Universit´ e de Montr´ eal Research intern at Element AI G ENERATIVE M ODELING AND M ODEL -B ASED R EASONING FOR R OBOTICS AND AI W ORKSHOP June 14, 2019

R EDUCING N OISE IN GAN T RAINING WITH V ARIANCE R EDUCED E XTRAGRADIENT T ATJANA C HAVDAROVA * G AUTHIER G IDEL * F RANC ¸ OIS F LEURET S IMON L ACOSTE -J ULIEN * Equal contribution

G ENERATIVE A DVERSARIAL N ETWORKS [Goodfellow et al., 2014] Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 3 / 17

C HALLENGES - Standard supervised learning: min θ L ( θ ) - GANs: Hard (different) optimization problem: minimax. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 4 / 17

C HALLENGES - Standard supervised learning: min θ L ( θ ) - GANs: Hard (different) optimization problem: minimax. Image source: Vaishnavh Nagarajan Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 4 / 17

“N OISE ” : NOISY GRADIENT ESTIMATES D UE TO STOCHASTICITY - Using sub-samples (mini-batches) of the full dataset to update the parameters - Variance Reduced (VR) Gradient: optimization methods that reduce such noise Minimization: Single-objective φ θ Batch method direction Stochastic method direction: noisy Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 5 / 17

V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : φ φ θ θ Minimization Game “ approximately ” the right direction Direction with noise can be “ bad ” Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17

V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : Brock et al. [2018] report a relative improvement of 46% of the Inception Score metric [Salimans et al., 2016] on ImageNet if the mini-batch size is increased 8 –fold. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17

V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : - Adversarial aspect from min-max → Extragradient. - Noise from stochastic gradient → Variance Reduction. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17

E XTRAGRADIENT Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 7 / 17

E XTRAGRADIENT Two players θ , ϕ . Idea: perform a “Lookahead step ” � θ t +1 / 2 = θ t − η ∇ θ L G ( θ t , ϕ t ) Extrapolation: ϕ t +1 / 2 = ϕ t − η ∇ ϕ L D ( θ t , ϕ t ) � θ t +1 = θ t − η ∇ θ L G ( θ t +1 / 2 , ϕ t +1 / 2 ) Update: ϕ t +1 = ϕ t − η ∇ ϕ L D ( θ t +1 / 2 , ϕ t +1 / 2 ) Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 8 / 17

V ARIANCE R EDUCED G RADIENT M ETHODS Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 9 / 17

V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� i � �� Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17

SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. SVRE yields the fastest convergence rate for strongly convex stochastic game optimization in the literature. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17

min L ( ) - GANs: Hard (different) optimization problem: minimax. - PowerPoint PPT Presentation

SVRE: NEW METHOD FOR TRAINING GAN S G AUTHIER G IDEL Mila, Universit e de Montr eal Research intern at Element AI G ENERATIVE M ODELING AND M ODEL -B ASED R EASONING FOR R OBOTICS AND AI W ORKSHOP June 14, 2019 R EDUCING N OISE IN GAN T

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

CENTRE-BERCY 5 Min 10 Min 45 Min 55 Min DESTINATION PARIS BERCY ACCORHOTELS ARENA THE SEINE

procedure SERIAL MIN ( A , n ) 1. 2. begin 3. min = A [ 0 ] ; 4. for i := 1 to n 1 do 5.

Lizzi Meister ISJL Community Engagement Fellows Set Induction (5 min) Introduction to the

In the area: 400 meter to beach 5 min to supermarket 10 min to Guzelyurt 55

October 25, 2017 Aja Philp & Kerry Hamilton, Planners Workshop Overview 5 min - Welcome -

E RA- MIN 2 Sta rting De c 1 st 2016 2 About ERA MIN 2 ERA MIN 2 is an ERA NET

MA111: Contemporary mathematics Task Duration Finish first A: 6 min Nothing B: 5 min

Meta-Learning Lake 2019 & McCoy et al. 2020 By Joe O'Connor, Abby Bertics, and Ferran Alet

AUDITORIUM 5 KM 11 MIN RIDE PARCO DELLA MUSICA 8 KM 15 MIN RIDE GEMELLI HOSPITAL

Fatalities by time after crash 35% 31% (0-9 Min.) (90+ Min.) 34% (10-90 Min.) GSM/CDMA GPS

FRAME THE PROBLEM Our Agenda 5 Introductions, Curriculum Overview min 30 Forming Measureable

IAAC Parallel - 16.58 min FABRICATION CLASS 02.// MILLING 01. Milling Strategy --- Prof.

CREATE AND USE SURVEYS Our Agenda 5 Introductions, Curriculum Overview min 5 Review and

May 4 th , 2017 Aja Philp & Kerry Hamilton, Planners Workshop Overview 10 min - Welcome

A Small Reflection On Group Automorphisms Franc ois Garillot Mathematical Components

Applications of vector-valued modular forms Cameron Franc (joint w. Geoff Mason) University of

HPSS Treefrog Introduction HUF 2017 http://www.hpss-collaboration.org Disclaimer Forward

Opportunity Assessment Launching New Products and Businesses What is Entrepreneurship? The

Test automation / JUnit Building automatically repeatable test suites JUnit in Eclipse For

Model-free, Model-based, and General Intelligence Hector Geffner ICREA & Universitat Pompeu

Dynamic analysis tools considered difficult (to write) Stephen Kell stephen.kell@usi.ch

The Fermat cubic, elliptic functions, continued fractions, and a combinatorial excursion Philippe

min L ( ) - GANs: Hard (different) optimization problem: minimax. - PowerPoint PPT Presentation

SVRE: NEW METHOD FOR TRAINING GAN S G AUTHIER G IDEL Mila, Universit e de Montr eal Research intern at Element AI G ENERATIVE M ODELING AND M ODEL -B ASED R EASONING FOR R OBOTICS AND AI W ORKSHOP June 14, 2019 R EDUCING N OISE IN GAN T

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

CENTRE-BERCY 5 Min 10 Min 45 Min 55 Min DESTINATION PARIS BERCY ACCORHOTELS ARENA THE SEINE

procedure SERIAL MIN ( A , n ) 1. 2. begin 3. min = A [ 0 ] ; 4. for i := 1 to n 1 do 5.

Lizzi Meister ISJL Community Engagement Fellows Set Induction (5 min) Introduction to the

In the area: 400 meter to beach 5 min to supermarket 10 min to Guzelyurt 55

October 25, 2017 Aja Philp &amp; Kerry Hamilton, Planners Workshop Overview 5 min - Welcome -

E RA- MIN 2 Sta rting De c 1 st 2016 2 About ERA MIN 2 ERA MIN 2 is an ERA NET

MA111: Contemporary mathematics Task Duration Finish first A: 6 min Nothing B: 5 min

Meta-Learning Lake 2019 &amp; McCoy et al. 2020 By Joe O'Connor, Abby Bertics, and Ferran Alet

AUDITORIUM 5 KM 11 MIN RIDE PARCO DELLA MUSICA 8 KM 15 MIN RIDE GEMELLI HOSPITAL

Fatalities by time after crash 35% 31% (0-9 Min.) (90+ Min.) 34% (10-90 Min.) GSM/CDMA GPS

FRAME THE PROBLEM Our Agenda 5 Introductions, Curriculum Overview min 30 Forming Measureable

IAAC Parallel - 16.58 min FABRICATION CLASS 02.// MILLING 01. Milling Strategy --- Prof.

CREATE AND USE SURVEYS Our Agenda 5 Introductions, Curriculum Overview min 5 Review and

May 4 th , 2017 Aja Philp &amp; Kerry Hamilton, Planners Workshop Overview 10 min - Welcome

A Small Reflection On Group Automorphisms Franc ois Garillot Mathematical Components

Applications of vector-valued modular forms Cameron Franc (joint w. Geoff Mason) University of

HPSS Treefrog Introduction HUF 2017 http://www.hpss-collaboration.org Disclaimer Forward

Opportunity Assessment Launching New Products and Businesses What is Entrepreneurship? The

Test automation / JUnit Building automatically repeatable test suites JUnit in Eclipse For

Model-free, Model-based, and General Intelligence Hector Geffner ICREA &amp; Universitat Pompeu

Dynamic analysis tools considered difficult (to write) Stephen Kell stephen.kell@usi.ch

The Fermat cubic, elliptic functions, continued fractions, and a combinatorial excursion Philippe

October 25, 2017 Aja Philp & Kerry Hamilton, Planners Workshop Overview 5 min - Welcome -

Meta-Learning Lake 2019 & McCoy et al. 2020 By Joe O'Connor, Abby Bertics, and Ferran Alet

May 4 th , 2017 Aja Philp & Kerry Hamilton, Planners Workshop Overview 10 min - Welcome

Model-free, Model-based, and General Intelligence Hector Geffner ICREA & Universitat Pompeu