SVRE: NEW METHOD FOR TRAINING GAN S G AUTHIER G IDEL Mila, Universit´ e de Montr´ eal Research intern at Element AI G ENERATIVE M ODELING AND M ODEL -B ASED R EASONING FOR R OBOTICS AND AI W ORKSHOP June 14, 2019
R EDUCING N OISE IN GAN T RAINING WITH V ARIANCE R EDUCED E XTRAGRADIENT T ATJANA C HAVDAROVA * G AUTHIER G IDEL * F RANC ¸ OIS F LEURET S IMON L ACOSTE -J ULIEN * Equal contribution
G ENERATIVE A DVERSARIAL N ETWORKS [Goodfellow et al., 2014] Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 3 / 17
C HALLENGES - Standard supervised learning: min θ L ( θ ) - GANs: Hard (different) optimization problem: minimax. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 4 / 17
C HALLENGES - Standard supervised learning: min θ L ( θ ) - GANs: Hard (different) optimization problem: minimax. Image source: Vaishnavh Nagarajan Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 4 / 17
“N OISE ” : NOISY GRADIENT ESTIMATES D UE TO STOCHASTICITY - Using sub-samples (mini-batches) of the full dataset to update the parameters - Variance Reduced (VR) Gradient: optimization methods that reduce such noise Minimization: Single-objective φ θ Batch method direction Stochastic method direction: noisy Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 5 / 17
V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : φ φ θ θ Minimization Game “ approximately ” the right direction Direction with noise can be “ bad ” Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17
V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : Brock et al. [2018] report a relative improvement of 46% of the Inception Score metric [Salimans et al., 2016] on ImageNet if the mini-batch size is increased 8 –fold. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17
V ARIANCE R EDUCTION –M OTIVATION FOR G AMES - I NTUITIVELY : M INIMIZATION V S . G AME (N OISE FROM S TOCHASTIC GRADIENT ) - E MPIRICALLY : B IG GAN –“I NCREASED BATCH SIZE SIGNIFICANTLY IMPROVES PERFORMANCES ” - T O SUM UP , TWO ISSUES : - Adversarial aspect from min-max → Extragradient. - Noise from stochastic gradient → Variance Reduction. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 6 / 17
E XTRAGRADIENT Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 7 / 17
E XTRAGRADIENT Two players θ , ϕ . Idea: perform a “Lookahead step ” � θ t +1 / 2 = θ t − η ∇ θ L G ( θ t , ϕ t ) Extrapolation: ϕ t +1 / 2 = ϕ t − η ∇ ϕ L D ( θ t , ϕ t ) � θ t +1 = θ t − η ∇ θ L G ( θ t +1 / 2 , ϕ t +1 / 2 ) Update: ϕ t +1 = ϕ t − η ∇ ϕ L D ( θ t +1 / 2 , ϕ t +1 / 2 ) Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 8 / 17
V ARIANCE R EDUCED G RADIENT M ETHODS Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 9 / 17
V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17
V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17
V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17
V ARIANCE R EDUCED E STIMATE OF THE G RADIENT Based on Finite sum assumption: n 1 � L ( x i , ω ) , n i =1 Epoch based algorithm: � - Save the full gradient 1 i ∇L ( x i , ω S ) and the snapshot ω S . n - For one epoch use the update rule: � � x i , ω S � � � + 1 ∇L ( x i , ω S ) − ∇L ω ← ω − η ∇L ( x i , ω ) n � �� � i � �� � Stochastic gradient correction using saved past iterate - Requires 2 stochastic gradients (at the current point and at the snapshot). - If ω S is close to ω → close to full batch gradient → small variance. - Full batch gradient expensive but tractable, e.g. , compute it once per pass. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 10 / 17
SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17
SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17
SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17
SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17
SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17
SVRE: V ARIANCE REDUCTION + E XTRAGRADIENT P SEUDO – ALGORITHM 1. Save snapshot ω S ← ω t and compute 1 � i ∇L ( x i , ω S ) . n 2. For i in 1 , . . . , epoch_length : - Compute ω t + 1 2 with variance reduced gradients at ω t . - Compute ω t +1 with variance reduced gradients at ω t + 1 2 . - t ← t + 1 3. Repeat until convergence. SVRE yields the fastest convergence rate for strongly convex stochastic game optimization in the literature. Gauthier Gidel Generative Modeling and Model-Based Reasoning for Robotics and AI Workshop 11 / 17
Recommend
More recommend