Norm matters: efficient and accurate normalization schemes in deep - PowerPoint PPT Presentation

Mar 22, 2023 •551 likes •663 views

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry Spotlight , NeurIPS 2018 Norm Matters - Poster #27 1 Equal contribution Batch normalization Shortcomings:

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron Banner*, Itay Golan*, Daniel Soudry Spotlight , NeurIPS 2018 Norm Matters - Poster #27 1 *Equal contribution
Batch normalization Shortcomings: • Assumes independence between samples (problem when modeling time-series, RL, GANs, metric-learning etc.) • Why it works? Interaction with other regularization • Significant computational and memory impact, with data-bound operations – up to 25% of computation time in current models (Gitman, 17 ’ ) 2 ) , numerically unstable. • Requires high-precision operations ( σ 𝑗 𝑦 𝑗 Norm Matters - Poster #27 2
Batch-norm Leads to norm invariance The key observation: 𝑥 • Given input 𝑦 , weight vector 𝑥 , its direction ෝ 𝑥 = 𝑥 • Batch-norm is norm invariant: 𝐶𝑂 𝑥 ෝ 𝑥𝑦 = 𝐶𝑂 ෝ 𝑥𝑦 • Weight norm only affects effective learning rate, e.g. in SGD: Norm Matters - Poster #27 3
Weight decay before BN is redundant • Weight-decay equivalent to learning-rate scaling • Can be mimicked by With WD Without WD Without WD + LR correction Norm Matters - Poster #27 4
Improving weight-norm This can help to make weight-norm work for large-scale models Weight normalization, for a channel 𝑗 : 𝑤 𝑗 𝑥 𝑗 = 𝑕 𝑗 𝑤 𝑗 Bounded Weight Normalization: 𝑥 𝑗 = 𝜍 𝑤 𝑗 𝑤 𝑗 𝜍 - constant determined from chosen initialization Resnet 50, ImageNet Norm Matters - Poster #27 5
Replacing Batch-norm – switching norms 𝑦 𝑗 − 𝑦 • Batch-normalization – just scaled 𝑀 2 normalization: 𝑦 𝑗 = ෝ 1 𝑜 𝑦− 𝑦 • More numerically stable norms: 2 𝑦 1 = σ 𝑗 |𝑦 𝑗 | 𝑦 ∞ = max 𝑗 |𝑦 𝑗 | We use additional scaling constants so that the norm will behave similarly to 𝑀 2 , by assuming that neural input is Gaussian, e.g.: 1 𝜌 1 = 2∙ 𝑜𝐹 𝑦− 𝑦 𝑜𝐹 𝑦− 𝑦 2 1 Norm Matters - Poster #27 6
𝑀 1 Batch-norm (Imagenet, Resnet) Norm Matters - Poster #27 7
Low precision batch-norm • 𝑀 1 batch-norm alleviates low-precision difficulties of batch-norm. • Can now train using Batch-Norm on ResNet50 without issues on FP16: Regular BN in FP16 fails L1 BN in FP16 works as well as L2 in FP32 Norm Matters - Poster #27 8
With a few more tricks … • Can now train ResNet 18 ImageNet with bottleneck operations in Int8 : 8 bit Also at NeurIPS 2018 Full Precision “ Scalable Methods for 8-bit Training of Neural Networks ” *Ron Banner, *Itay Hubara, *Elad Hoffer, Daniel Soudry Norm Matters - Poster #27 9
Thank you for your tim ime! Come vis isit us at poster #27 Norm Matters - Poster #27 10

Recommend

Modelling NORM in the Modelling NORM in the environment environment EMRAS Project, NORM Working

Modelling NORM in the Modelling NORM in the environment environment EMRAS Project, NORM Working Group EMRAS Project, NORM Working Group R.S. O Brien Brien Australia; Australia; P. McDonald UK; UK; R.S. O P.

950 views • 53 slides

EMRAS I (NORM) SUMMARY (Detailed information is in the main EMRAS I NORM working group report)

EMRAS I (NORM) SUMMARY (Detailed information is in the main EMRAS I NORM working group report) September 2009 VIENNA Outline Previous programs Aims of EMRAS Review of NORM situation Characteristics of NORM NORM industries

765 views • 35 slides

EMRAS 2 EMRAS 2 Working Group 1 Working Group 1 Legacy Sites and NORM Legacy Sites and NORM

EMRAS 2 EMRAS 2 Working Group 1 Working Group 1 Legacy Sites and NORM Legacy Sites and NORM NORM NORM Scenarios Scenarios Scenarios Scenarios Data Data Assessed Assessed Megalopolis Greece Yes Yes Megalopolis Greece Yes Yes

320 views • 12 slides

S et the Bar Low. Be a WINNER every time. Public Power Matters Public Power Matters Innovation

S et the Bar Low. Be a WINNER every time. Public Power Matters Public Power Matters Innovation Workforce Advocacy Y our S tory Rates Matter Matters Matters Matters Matters Its the Rates, S tupid! Innovation Matters We must

280 views • 26 slides

6. Approximation and fitting norm approximation least-norm problems regularized

Convex Optimization Boyd & Vandenberghe 6. Approximation and fitting norm approximation least-norm problems regularized approximation robust approximation 61 Norm approximation minimize Ax b ( A R m

877 views • 20 slides

CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha breaking

breaking the norm with CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha breaking the norm with CSS creative by @aganaplocha create a web s ite by

1.47k views • 102 slides

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny Garcia, Jr. ACCURATE ACCURATE ACCURATE ACCURATE DATA DATA DATA DATA Taking accurate data allows the coil Taking accurate data allows the coil

331 views • 28 slides

Rational Phosphorus Rational Phosphorus Management in Biosolids Management in Biosolids

Material Matters, Inc. Material Matters, Inc. Material Matters, Inc. Material Matters, Inc. Material Matters, Inc. Material Matters, Inc. Material Matters, Inc. Material Matters, Inc. Rational Phosphorus Rational Phosphorus Management in

682 views • 25 slides

NORM And TENORM: A New Legal Normal? Definitions NORM N aturally O ccurring R adioactive M

NORM And TENORM: A New Legal Normal? Definitions NORM N aturally O ccurring R adioactive M aterials Radioactive materials that are found in nature. They have been part of the natural environment since the earth was formed. TENORM T

822 views • 63 slides

BRAZILIAN EXPERIENCE IN REMEDIATION OF NORM SITES REMEDIATION OF NORM SITES Dejanira da Costa

BRAZILIAN EXPERIENCE IN REMEDIATION OF NORM SITES REMEDIATION OF NORM SITES Dejanira da Costa Lauria Institute of Radiation Protection and Dosimetry Brazilian Nuclear Energy Commission (IRD/CNEN) Monazite Ce, La (PO 4 ) 39 % U 3 O 8

490 views • 30 slides

Global Warming: Global Warming: Manmade Mess or Natures Norm? or Nature s Norm? Dr. Roy W.

Global Warming: Global Warming: Manmade Mess or Natures Norm? or Nature s Norm? Dr. Roy W. Spencer Principal Research Scientist The University of Alabama in Huntsville CO 2 Is Necessary for Life on Earth 1) Only 29 of every 100,000

419 views • 23 slides

Applications (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norm

Applications (I) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Norm Approximation Basic Norm Approximation Penalty Function Approximation Approximation with Constraints Least-norm Problems Regularized

868 views • 59 slides

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate floating-point summation? ! Round-off error: source and recovery A new method for accurate FP summation on a GPU Added as a function to the open-source

619 views • 28 slides

20 January 2017 1 Purpose Where Every Child Matters, Every Staff Matters Parents to know

Where Every Child Matters, Every Staff Matters Meet-the-Parents Session 20 January 2017 1 Purpose Where Every Child Matters, Every Staff Matters Parents to know their childs teachers Teachers to share expectations and class

481 views • 27 slides

Curriculum matters Mark Phillips Senior HMI, London Monday 3 July 2017 Curriculum matters - 3

Curriculum matters Mark Phillips Senior HMI, London Monday 3 July 2017 Curriculum matters - 3 July 2017 Slide 1 Good morning Curriculum matters - 3 July 2017 Slide 2 A focus on the school curriculum Curriculum matters - 3 July 2017 Slide 3

648 views • 15 slides

When (Low ) Pow er Really Matters When (Low ) Pow er Really Matters When (Low ) Pow er Really

When (Low ) Pow er Really Matters When (Low ) Pow er Really Matters When (Low ) Pow er Really Matters When (Low ) Pow er Really Matters Yogesh K. Ramadass, AnanthaGroup Microsystems Technology Laboratory Outline Outline Outline Outline

507 views • 36 slides

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Organizational aspects Lectures: Tuesday 8-10, MA A110. First: September 25, Last: December

569 views • 26 slides

Multi-norms H. G. Dales (Lancaster) Fields Institute, Toronto 20/21 March 2014 1 References

Multi-norms H. G. Dales (Lancaster) Fields Institute, Toronto 20/21 March 2014 1 References BDP : O. Blasco, H. G. Dales, and H. L. Pham, Equivalences involving ( p, q ) -multi-norms , preprint. DP1 : H. G. Dales and M. E. Polyakov,

938 views • 58 slides

Slaters condition: proof p* = inf x f(x) s.t. Ax = b, g(x) 0 e.g., inf x 2 s.t. e x+2

Slaters condition: proof p* = inf x f(x) s.t. Ax = b, g(x) 0 e.g., inf x 2 s.t. e x+2 3 0 A = e.g., A = Picture of set A L(y,z) = Nonconvex example Interpretations L(x, y, z) = f(x) + y T (Axb) + z T g(x)

527 views • 27 slides

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of length from R 2 and R 3 to higher dimensions Cant really visualize in higher dimensions Must trust algebra Generalize the concept of

1.44k views • 113 slides

Linear switched DAEs: Lyapunov exponents, a converse Lyapunov theorem, and Barabanov norms Stephan

Linear switched DAEs: Lyapunov exponents, a converse Lyapunov theorem, and Barabanov norms Stephan Trenn and Fabian Wirth Technomathematics group, University of Kaiserslautern, Germany Department for Mathematics, University of

414 views • 14 slides

Optimization of quadratic forms and t -norm forms on interval domains and computational complexity

Optimization of quadratic forms and t -norm forms on interval domains and computational complexity k 2 and Vladik Kreinovich 3 Michal y 1 , Milan Hlad Cern 1 University of Economics, Prague, Czech Republic 2 Charles University, Prague,

790 views • 40 slides

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1 Column Subset

316 views • 19 slides

On the Query Complexity of Real Functionals Hugo Fre, Walid Gomaa, Mathieu Hoyrup Hugo

Outline Introduction Complexity of Norms Query Complexity One Oracle Access On the Query Complexity of Real Functionals Hugo Fre, Walid Gomaa, Mathieu Hoyrup Hugo Fre, Walid Gomaa, Mathieu Hoyrup On the Query Complexity of Real

936 views • 28 slides