loss surfaces mode connectivity and fast ensembling of
play

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur - PowerPoint PPT Presentation

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur Garipov 1 , 2 Pavel Izmailov 3 Dmitrii Podoprikhin 4 Dmitry Vetrov 5 Andrew Gordon Wilson 3 1 Samsung AI Center in Moscow, 2 Skolkovo Institute of Science and Technology, 3 Cornell


  1. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur Garipov 1 , 2 Pavel Izmailov 3 Dmitrii Podoprikhin 4 Dmitry Vetrov 5 Andrew Gordon Wilson 3 1 Samsung AI Center in Moscow, 2 Skolkovo Institute of Science and Technology, 3 Cornell University, 4 Samsung-HSE Laboratory, 5 National Research University Higher School of Economics Neural Information Processing Systems Montreal, Canada December 4, 2018 1/10

  2. Loss Surfaces ResNet-164, CIFAR-100 2/10

  3. Loss Surfaces ResNet-164, CIFAR-100 3/10

  4. Finding Paths between Modes w 2 ∈ R | net | Weights of pretrained networks: � w 1 , � Define parametric curve: φ θ ( · ) [0 , 1] → R | net | φ θ (0) = � w 1 , φ θ (1) = � w 2 DNN loss function: L ( w ) Minimize averaged loss w.r.t. θ � 1 minimize ℓ ( θ ) = L ( φ θ ( t )) dt = E t ∼ U (0 , 1) L ( φ θ ( t )) θ 0 4/10

  5. 5/10

  6. Loss Surfaces VGG-16, CIFAR-10 80 > 3 > 3 > 3 50 3 3 3 60 Train loss 60 40 1.3 1.3 1.3 30 40 40 0.56 0.55 0.55 20 0.26 0.25 0.24 20 20 0.13 0.12 0.12 10 0 0.078 0.066 0.064 0 0 − 20 0.055 0.044 0.042 − 10 − 20 0.039 0.028 0.026 − 20 0 20 40 60 80 100 − 20 0 20 40 60 80 − 20 0 20 40 60 80 Test error (%) > 40 > 40 > 40 80 50 40 40 40 60 60 40 25 25 25 30 40 40 17 16 16 20 12 12 12 20 20 9.7 10 9.5 9.4 0 8.3 8.2 8.1 0 0 − 20 7.6 7.5 7.4 − 10 − 20 6.8 6.7 6.6 − 20 0 20 40 60 80 100 − 20 0 20 40 60 80 − 20 0 20 40 60 80 6/10

  7. 7/10

  8. Fast Geometric Ensembles (FGE) Learning rate α 1 α 2 Learning Rate n Test error (%) 35 30 25 c c c 15 Distance Ensemble 10 75% training 5 0 Epoch 0 0.5 c 1 c 1.5 c 2 c 2.5 c 3 c 3.5 c FGE iteration number 8/10

  9. Ensembling Results 82 SSE separate SSE ensemble 1 B model FGE separate FGE ensemble 80 Test accuracy (%) 78 76 74 0 0.5 B B 1.5 B 2 B Training budget SSE = Huang et al., (“Snapshot ensembles: Train 1, get m for free”), ICLR 2017 9/10

  10. Summary Local optima are connected by simple curves. To find these curves we minimize loss uniformly in expectation over a path from one mode to another. We are inspired by these insights to propose a fast ensembling algorithm. PyTorch code released for both mode connectivity and FGE Come to our poster #162! 10/10

Recommend


More recommend