Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur - PowerPoint PPT Presentation

Aug 25, 2023 •711 likes •819 views

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur Garipov 1 , 2 Pavel Izmailov 3 Dmitrii Podoprikhin 4 Dmitry Vetrov 5 Andrew Gordon Wilson 3 1 Samsung AI Center in Moscow, 2 Skolkovo Institute of Science and Technology, 3 Cornell

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Timur Garipov 1 , 2 Pavel Izmailov 3 Dmitrii Podoprikhin 4 Dmitry Vetrov 5 Andrew Gordon Wilson 3 1 Samsung AI Center in Moscow, 2 Skolkovo Institute of Science and Technology, 3 Cornell University, 4 Samsung-HSE Laboratory, 5 National Research University Higher School of Economics Neural Information Processing Systems Montreal, Canada December 4, 2018 1/10
Loss Surfaces ResNet-164, CIFAR-100 2/10
Loss Surfaces ResNet-164, CIFAR-100 3/10
Finding Paths between Modes w 2 ∈ R | net | Weights of pretrained networks: � w 1 , � Define parametric curve: φ θ ( · ) [0 , 1] → R | net | φ θ (0) = � w 1 , φ θ (1) = � w 2 DNN loss function: L ( w ) Minimize averaged loss w.r.t. θ � 1 minimize ℓ ( θ ) = L ( φ θ ( t )) dt = E t ∼ U (0 , 1) L ( φ θ ( t )) θ 0 4/10
5/10
Loss Surfaces VGG-16, CIFAR-10 80 > 3 > 3 > 3 50 3 3 3 60 Train loss 60 40 1.3 1.3 1.3 30 40 40 0.56 0.55 0.55 20 0.26 0.25 0.24 20 20 0.13 0.12 0.12 10 0 0.078 0.066 0.064 0 0 − 20 0.055 0.044 0.042 − 10 − 20 0.039 0.028 0.026 − 20 0 20 40 60 80 100 − 20 0 20 40 60 80 − 20 0 20 40 60 80 Test error (%) > 40 > 40 > 40 80 50 40 40 40 60 60 40 25 25 25 30 40 40 17 16 16 20 12 12 12 20 20 9.7 10 9.5 9.4 0 8.3 8.2 8.1 0 0 − 20 7.6 7.5 7.4 − 10 − 20 6.8 6.7 6.6 − 20 0 20 40 60 80 100 − 20 0 20 40 60 80 − 20 0 20 40 60 80 6/10
7/10
Fast Geometric Ensembles (FGE) Learning rate α 1 α 2 Learning Rate n Test error (%) 35 30 25 c c c 15 Distance Ensemble 10 75% training 5 0 Epoch 0 0.5 c 1 c 1.5 c 2 c 2.5 c 3 c 3.5 c FGE iteration number 8/10
Ensembling Results 82 SSE separate SSE ensemble 1 B model FGE separate FGE ensemble 80 Test accuracy (%) 78 76 74 0 0.5 B B 1.5 B 2 B Training budget SSE = Huang et al., (“Snapshot ensembles: Train 1, get m for free”), ICLR 2017 9/10
Summary Local optima are connected by simple curves. To find these curves we minimize loss uniformly in expectation over a path from one mode to another. We are inspired by these insights to propose a fast ensembling algorithm. PyTorch code released for both mode connectivity and FGE Come to our poster #162! 10/10

Recommend

Booster Fast Loss Monitoring PIP Booster Workshop R.J. Tesarek 11/23/15 1 Fast Loss Monitor

Booster Fast Loss Monitoring PIP Booster Workshop R.J. Tesarek 11/23/15 1 Fast Loss Monitor Module Fast Loss Monitors: Module Schematic sensitive to losses in single RF bucket (time resolved) 2nd Generation Module Design: 2 PMTs and

637 views • 26 slides

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Outline Background and Motivation Classifier Ensembling Genetic Algorithms Proposed Method of Classifier Ensemble Feature Set Used Experimental Results Conclusions Future Works Maximum Entropy Classifier Ensembling using Ge- netic

749 views • 36 slides

Fast & Secure LTE Connectivity Solutions Fast & Secure LTE Connectivity Solutions Remote

April 29 th Webinar April 29 th Webinar Fast & Secure LTE Connectivity Solutions Fast & Secure LTE Connectivity Solutions Remote Ready, Productively Working From Home Remote Ready, Productively Working From Home Bryan Buckley Director

646 views • 28 slides

Tail Loss Probe (TLP) Converting RTOs to fast recoveries draft-dukkipati-tcpm-tcp-loss-probe-00

Tail Loss Probe (TLP) Converting RTOs to fast recoveries draft-dukkipati-tcpm-tcp-loss-probe-00 Nandita Dukkipati, Neal Cardwell, Yuchung Cheng, Matt Mathis {nanditad, ncardwell, ycheng, mattmathis}@google.com Losses hurt Web latency

206 views • 10 slides

MEF An Increasingly Connected Future 1 Connectivity. Ubiquity All the time Fast

MEF An Increasingly Connected Future 1 Connectivity. Ubiquity All the time Fast On every device 2 Somewhere Else. Storage Power consumption Maintenance 3 In Other Words.. Global Ethernet LAN IT

296 views • 6 slides

How (Not) to Shoot in Your Foot with SDN Local Fast Failover A Load-Connectivity Tradeoff

How (Not) to Shoot in Your Foot with SDN Local Fast Failover A Load-Connectivity Tradeoff Michael Borokhovich, Stefan Schmid Communication Systems Engineering, Ben-Gurion University, Israel Internet Network Architectures, TU Berlin &

554 views • 54 slides

Provable Data Plane Connectivity with Local Fast Failover Introducing OpenFlow Graph Algorithms

Provable Data Plane Connectivity with Local Fast Failover Introducing OpenFlow Graph Algorithms Michael Borokhovich (Ben Gurion Uni, Israel) Liron Schiff (Tel Aviv Uni, Israel) Stefan Schmid (TU Berlin & T-Labs, Germany) Robust Routing

488 views • 12 slides

Subdivision Surfaces CAGD Ofir Weber 1 Spline Surfaces Spline Surfaces Why use them?

Subdivision Surfaces CAGD Ofir Weber 1 Spline Surfaces Spline Surfaces Why use them? Smooth Good for modeling - easy to control Compact (complex objects are represented by less numbers) Flexibility (different

677 views • 38 slides

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34 Outline Cross

1.18k views • 94 slides

1 Loss Classes Loss Classes Irregular (uncontrolled, fast) losses : Regular (controlled, slow)

1 f Discussing Wire = 0 C / bunch [ ] T C dE dx n Beam Loss Monitors Scanner heat load: h bunch 2 v c p v By Kay Wittenburg, 3. Wire heat load Deutsches Elektronen Synchrotron DESY,

556 views • 10 slides

Aviation Emissions in Context Fast, reliable, and safe mode of transport 2.3 billion

Aviation Emissions in Context Fast, reliable, and safe mode of transport 2.3 billion passengers per year carried worldwide Scheduled passenger traffic worldwide forecast to grow at 4.6% per year Aircraft produced today are

540 views • 9 slides

Fast ADT test: dependency of the simulated energy deposition on the loss distribution N. Shetty,

Fast ADT test: dependency of the simulated energy deposition on the loss distribution N. Shetty, A. Lechner ( on behalf of the FLUKA team ) and with contributions from V. Chetvertkova, A. Priebe Quench-Test Analysis Working Group Meeting

487 views • 11 slides

Defining Point-Set Surfaces Nina Amenta Yong Joo Kil SIGGRAPH 2004 11/2/2005 1 Point-Set

Defining Point-Set Surfaces Nina Amenta Yong Joo Kil SIGGRAPH 2004 11/2/2005 1 Point-Set Surfaces Surface S implied by the point cloud P No connectivity Surface properties Does x belong to S Project x to S 11/2/2005 2

680 views • 16 slides

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell

RACK: a time-based fast loss recovery draft-ietf-tcpm-rack-01 Yuchung Cheng Neal Cardwell Nandita Dukkipati Google IETF97: Seoul, Nov 2016 SYN Whats RACK (Recent ACK)? SYN/ACK ACK Key Idea: time-based loss inferences (not packet or P1

253 views • 23 slides

Normal and minimal surfaces The correspondence between normal surfaces and minimal surfaces has

Normal and minimal surfaces The correspondence between normal surfaces and minimal surfaces has more applications. It can be used to investigate classical problems in Differential Geometry. Classical Isoperimetric Inequality: A curve in R 2

1.14k views • 79 slides

STROKE Professor Iqbal Singh OBE FRCP Stroke? ACT FAST National Stroke Strategy QM1,

STROKE Professor Iqbal Singh OBE FRCP Stroke? ACT FAST National Stroke Strategy QM1, Raising Awareness, QM2 Time is Brain Clinical presentations of stroke Acute onset combination of Face / arm / leg weakness or sensory loss Loss

701 views • 22 slides

Fast statistical methods for mapping synaptic connectivity on dendrites Liam Paninski Department

Fast statistical methods for mapping synaptic connectivity on dendrites Liam Paninski Department of Statistics and Center for Theoretical Neuroscience Columbia University http://www.stat.columbia.edu/ liam liam@stat.columbia.edu March 29,

508 views • 21 slides

ZMAC: A Fast Tweakable Block Cipher Mode for Highly Secure Message Authentication Tetsu Iwata

ZMAC: A Fast Tweakable Block Cipher Mode for Highly Secure Message Authentication Tetsu Iwata 1 Kazuhiko Minematsu 2 Thomas Peyrin 3 Yannick Seurin 4 1 Nagoya University (Japan) and 2 NEC (Japan) 3 NTU (Singapore) and 4 ANSSI (France)

378 views • 35 slides

Fast Ion Loss Detectors for Wave-Particle Interaction Studies Poloidal array of energy and

Fast Ion Loss Detectors for Wave-Particle Interaction Studies Poloidal array of energy and pitch/energy resolving detectors will measure ICRH tail ion fluxes escaping the plasma. D.C. Pace 1 , R.S. Granetz 2 , S.J. Zweben 3 , D.S. Darrow 3 , C.

70 views • 3 slides

HybriD AFM Mode A.S. Kalinin, chief R&D engineer, NT-MDT Spectrum Instruments, Moscow, Russia

Rebirth of Force Spectroscopy: HybriD AFM Mode A.S. Kalinin, chief R&D engineer, NT-MDT Spectrum Instruments, Moscow, Russia November 15 th , 2017 Agenda Introduction HybriD (HD) mode working principle Fast quantitative nanomechanical

695 views • 43 slides

A NEW STANDARD FOR IOT CONNECTIVITY Managed connectivity services for Internet of Things

A NEW STANDARD FOR IOT CONNECTIVITY Managed connectivity services for Internet of Things applications Cologne, 2020 " 1NCE has set a new standard in IoT connectivity. It will " become the leading global connectivity platform for the

847 views • 37 slides

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision Nazneen Rajani PhD Proposal November 7, 2016 Committee members: Ray Mooney, Katrin Erk, Greg Durrett and Ken Barker Outline Introduction Background

991 views • 77 slides

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks DCASE 2017 Eduardo Fonseca, Rong Gong, Dmitry Bogdanov, Olga Slizovskaia, Emilia Gomez and Xavier Serra Outline Introduction

581 views • 38 slides

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan Kondratyuk, Mingxing Tan, Matuhew Brown, Boqing Gong {dankondratyuk,tanmingxing,mtbr,bgong}@google.com Model Ensembles Train multiple models and

272 views • 6 slides