Why Squashing Functions in Shall We Go Beyond . . . Which . . . - PowerPoint PPT Presentation

A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C. Urenda 1 , Orsolya Csisz´ ar 2 , 3 , G´ ar 4 , abor Csisz´ ozsef Dombi 5 , Olga Kosheleva 1 , Vladik Kreinovich 1 , J´ Title Page orgy Eigner 3 Gy¨ ◭◭ ◮◮ 1 University of Texas at El Paso, USA ◭ ◮ 2 University of Applied Sciences Esslingen, Germany 3 ´ Obuda University, Budapest, Hungary Page 1 of 46 4 University of Stuttgart, Germany Go Back 5 University of Szeged, Hungary Full Screen E-mails: vladik@utep.edu, orsolya.csiszar@nik.uni-obuda.hu, gabor.csiszar@mp.imw.uni-stuttgart.de, dombi@inf.u-szeged.hu, Close olgak@utep.edu, vladik@utep.edu, eigner.gyorgy@nik.uni-obuda.hu Quit

A Short Introduction 1. A Short Introduction Machine Learning Is . . . Deep Learning • In their successful applications, deep neural networks Shall We Go Beyond . . . use a non-linear transformation s ( z ) = max(0 , z ). Which . . . • It is called a rectified linear activation function. Invariance Traditional Neural . . . • Sometimes, more general transformations – called squash- This Leads Exactly to . . . ing functions – lead to even better results. Home Page • In this talk, we provide a theoretical explanation for Title Page this empirical fact. ◭◭ ◮◮ • To provide this explanation, let us first briefly recall: ◭ ◮ – why we need machine learning in the first place, Page 2 of 46 – what are deep neural networks, and Go Back – what activation functions these neural networks use. Full Screen Close Quit

A Short Introduction 2. Machine Learning Is Needed Machine Learning Is . . . Deep Learning • For some simple systems, we know the equations that Shall We Go Beyond . . . describe the system’s dynamics. Which . . . • These equations may be approximate, but they are of- Invariance ten good enough. Traditional Neural . . . • With more complex systems (such as systems of sys- This Leads Exactly to . . . Home Page tems), this is often no longer the case. Title Page • Even when we have a good approximate model for each subsystem, the corresponding inaccuracies add up. ◭◭ ◮◮ • So, the resulting model of the whole system is too in- ◭ ◮ accurate to be useful. Page 3 of 46 • We also need to use the records of the actual system’s Go Back behavior when making predictions. Full Screen • Using the previous behavior to predict the future is Close called machine learning . Quit

A Short Introduction 3. Deep Learning Machine Learning Is . . . Deep Learning • The most efficient machine learning technique is deep Shall We Go Beyond . . . learning : the use of multi-layer neural networks. Which . . . • In general, on a layer of a neural network, we transform Invariance � n � signals x 1 , . . . , x n into a new signal y = s � w i · x i + w 0 . Traditional Neural . . . i =1 This Leads Exactly to . . . • The coefficient w i (called weights ) are to be determined Home Page during training. Title Page • s ( z ) is a non-linear function called activation function . ◭◭ ◮◮ • Most multi-layer neural networks use s ( z ) = max( z, 0) ◭ ◮ known as rectified linear function. Page 4 of 46 Go Back Full Screen Close Quit

A Short Introduction 4. Shall We Go Beyond Rectified Linear? Machine Learning Is . . . Deep Learning • Preliminary analysis shows that for some applications: Shall We Go Beyond . . . – it is more advantageous to use different activation Which . . . functions for different neurons; Invariance – specifically, this was shown for a special family of Traditional Neural . . . squashing activation functions This Leads Exactly to . . . Home Page λ · β · ln 1 + exp( β · z − ( a − λ/ 2)) 1 S ( β ) a,λ ( z ) = 1 + exp( β · z − ( a + λ/ 2)); Title Page ◭◭ ◮◮ – this family contains rectified linear neurons as a particular case. ◭ ◮ • We explain their empirical success of squashing func- Page 5 of 46 tions by showing that: Go Back – their formulas Full Screen – follow from reasonably natural symmetries. Close Quit

A Short Introduction 5. How This Talk Is Structured Machine Learning Is . . . Deep Learning • First, we recall the main ideas of symmetries and in- Shall We Go Beyond . . . variance. Which . . . • Then, we recall how these ideas can be used to explain Invariance the efficiency of the sigmoid activation function Traditional Neural . . . 1 This Leads Exactly to . . . s 0 ( z ) = 1 + exp( − z ) . Home Page Title Page • This function is used in the traditional 3-layer neural ◭◭ ◮◮ networks. ◭ ◮ • Finally, we use this information to explain the efficiency of squashing activation functions. Page 6 of 46 Go Back Full Screen Close Quit

A Short Introduction 6. Which Transformations Are Natural? Machine Learning Is . . . Deep Learning • From the mathematical viewpoint, we can apply any Shall We Go Beyond . . . non-linear transformation. Which . . . • However, some of these transformations are purely math- Invariance ematical, with no clear physical interpretation. Traditional Neural . . . This Leads Exactly to . . . • Other transformation are natural in the sense that they Home Page have physical meaning. Title Page • What are natural transformations? ◭◭ ◮◮ ◭ ◮ Page 7 of 46 Go Back Full Screen Close Quit

A Short Introduction 7. Numerical Values Change When We Change a Machine Learning Is . . . Measuring Unit And/Or Starting Point Deep Learning Shall We Go Beyond . . . • In data processing, we deal with numerical values of Which . . . different physical quantities. Invariance • Computers just treat these values as numbers. Traditional Neural . . . This Leads Exactly to . . . • However, from the physical viewpoint, the numerical Home Page values are not absolute; they change: Title Page – if we change the measuring unit and/or ◭◭ ◮◮ – the starting point for measuring the corresponding quantity. ◭ ◮ • The corresponding changes in numerical values are clearly Page 8 of 46 physically meaningful, i.e., natural. Go Back • For example, we can measure a person’s height in me- Full Screen ters or in centimeters. Close Quit

A Short Introduction 8. Numerical Values Change (cont-d) Machine Learning Is . . . Deep Learning • The same height of 1.7 m, when described in centime- Shall We Go Beyond . . . ters, becomes 170 cm. Which . . . • In general, if we replace the original measuring unit Invariance with a new unit which is λ times smaller, then: Traditional Neural . . . – instead of the original numerical value x , This Leads Exactly to . . . Home Page – we get a new numerical value λ · x – while the actual quantity remains the same. Title Page • Such a transformation x → λ · x is known as scaling . ◭◭ ◮◮ ◭ ◮ • For some quantities, e.g., for time or temperature, the numerical value also depends on the starting point. Page 9 of 46 • For example, we can measure the time from the mo- Go Back ment when the talk started. Full Screen • Alternatively, we can use the usual calendar time, in Close which Year 0 is the starting point. Quit

A Short Introduction 9. Numerical Values Change (cont-d) Machine Learning Is . . . Deep Learning • In general, if we replace the original starting point with Shall We Go Beyond . . . the new one which is x 0 units earlier, than: Which . . . – each original numerical value x Invariance – is replaced by a new numerical value x + x 0 . Traditional Neural . . . This Leads Exactly to . . . • Such a transformation x → x + x 0 is known as shift . Home Page • In general, if we change both the measuring unit and Title Page the starting point, we get a linear transformation: ◭◭ ◮◮ x → λ · x + x 0 . ◭ ◮ • A usual example of such a transformation is a transi- Page 10 of 46 tion from Celsius to Fahrenheit temperature scales: Go Back t F = 1 . 8 · t C + 32 . Full Screen Close Quit

A Short Introduction 10. Invariance Machine Learning Is . . . Deep Learning • Changing the measuring unit and/or starting point: Shall We Go Beyond . . . – changes the numerical values but Which . . . – does not change the actual quantity. Invariance Traditional Neural . . . • It is therefore reasonable to require that physical equations do not change if we simply: This Leads Exactly to . . . Home Page – change the measuring unit and/or Title Page – change the starting point. ◭◭ ◮◮ • Of course, to preserve the physical equations: ◭ ◮ – if we change the measuring unit and/or starting point for one quantity, Page 11 of 46 – we may need to change the measuring units and/or Go Back starting points for other quantities as well. Full Screen • For example, there is a well-known relation d = v · t Close between distance d , velocity v , and time t . Quit

Why Squashing Functions in Shall We Go Beyond . . . Which . . . - PowerPoint PPT Presentation

A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C.

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science &

Special relativity Squashing of the E-field line associated to a moving charge is suggestive

Bug Squashing with SQLsmith andreas.seltenreich@credativ.de October 25, 2018

drm/i915 Updates Daniel Vetter, Intel OTC FOSDEM 2013 bug squashing bugs fixed by the

Squashing the beast into a 60MB cage Tor Lillqvist <tml@collabora.com> tml,

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Declarations vs Definitions Inline Functions Class Member functions

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Elementary Functions Part 1, Functions Lecture 1.1c, Finding the domains of functions Dr. Ken W.

CS 61A Discussion 2 Environments and Higher Order Functions Albert Xu Slides:

libSVM LING572 Advanced Statistical Methods for NLP February 18, 2020 1 Documentation

Data Mining Lecture Notes for Chapter 4 Artificial Neural Networks Introduction to Data Mining ,

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING

Classifiers: Support Vector Machine 1 MACHINE LEARNING What is Classification? Female Adult

E9 205: Machine Learning for Signal Processing Introduction to 16-10-2019 Neural Network Models

CS 224d: Assignment #1 Due date: 4/19 11:59 PM PST (You are allowed to use three (3) late days

Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic

Sign-Up Campaign Internal Training Webinar Over-service the top 20% of your accounts. ~ David

Why Squashing Functions in Shall We Go Beyond . . . Which . . . - PowerPoint PPT Presentation

A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C.

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science &amp;

Special relativity Squashing of the E-field line associated to a moving charge is suggestive

Bug Squashing with SQLsmith andreas.seltenreich@credativ.de October 25, 2018

drm/i915 Updates Daniel Vetter, Intel OTC FOSDEM 2013 bug squashing bugs fixed by the

Squashing the beast into a 60MB cage Tor Lillqvist &lt;tml@collabora.com&gt; tml,

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Declarations vs Definitions Inline Functions Class Member functions

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Elementary Functions Part 1, Functions Lecture 1.1c, Finding the domains of functions Dr. Ken W.

CS 61A Discussion 2 Environments and Higher Order Functions Albert Xu Slides:

libSVM LING572 Advanced Statistical Methods for NLP February 18, 2020 1 Documentation

Data Mining Lecture Notes for Chapter 4 Artificial Neural Networks Introduction to Data Mining ,

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING

Classifiers: Support Vector Machine 1 MACHINE LEARNING What is Classification? Female Adult

E9 205: Machine Learning for Signal Processing Introduction to 16-10-2019 Neural Network Models

CS 224d: Assignment #1 Due date: 4/19 11:59 PM PST (You are allowed to use three (3) late days

Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic

Sign-Up Campaign Internal Training Webinar Over-service the top 20% of your accounts. ~ David

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science &

Squashing the beast into a 60MB cage Tor Lillqvist <tml@collabora.com> tml,