why squashing functions in
play

Why Squashing Functions in Shall We Go Beyond . . . Which . . . - PowerPoint PPT Presentation

A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C.


  1. A Short Introduction Machine Learning Is . . . Deep Learning Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance Traditional Neural . . . Networks This Leads Exactly to . . . Home Page Julio C. Urenda 1 , Orsolya Csisz´ ar 2 , 3 , G´ ar 4 , abor Csisz´ ozsef Dombi 5 , Olga Kosheleva 1 , Vladik Kreinovich 1 , J´ Title Page orgy Eigner 3 Gy¨ ◭◭ ◮◮ 1 University of Texas at El Paso, USA ◭ ◮ 2 University of Applied Sciences Esslingen, Germany 3 ´ Obuda University, Budapest, Hungary Page 1 of 46 4 University of Stuttgart, Germany Go Back 5 University of Szeged, Hungary Full Screen E-mails: vladik@utep.edu, orsolya.csiszar@nik.uni-obuda.hu, gabor.csiszar@mp.imw.uni-stuttgart.de, dombi@inf.u-szeged.hu, Close olgak@utep.edu, vladik@utep.edu, eigner.gyorgy@nik.uni-obuda.hu Quit

  2. A Short Introduction 1. A Short Introduction Machine Learning Is . . . Deep Learning • In their successful applications, deep neural networks Shall We Go Beyond . . . use a non-linear transformation s ( z ) = max(0 , z ). Which . . . • It is called a rectified linear activation function. Invariance Traditional Neural . . . • Sometimes, more general transformations – called squash- This Leads Exactly to . . . ing functions – lead to even better results. Home Page • In this talk, we provide a theoretical explanation for Title Page this empirical fact. ◭◭ ◮◮ • To provide this explanation, let us first briefly recall: ◭ ◮ – why we need machine learning in the first place, Page 2 of 46 – what are deep neural networks, and Go Back – what activation functions these neural networks use. Full Screen Close Quit

  3. A Short Introduction 2. Machine Learning Is Needed Machine Learning Is . . . Deep Learning • For some simple systems, we know the equations that Shall We Go Beyond . . . describe the system’s dynamics. Which . . . • These equations may be approximate, but they are of- Invariance ten good enough. Traditional Neural . . . • With more complex systems (such as systems of sys- This Leads Exactly to . . . Home Page tems), this is often no longer the case. Title Page • Even when we have a good approximate model for each subsystem, the corresponding inaccuracies add up. ◭◭ ◮◮ • So, the resulting model of the whole system is too in- ◭ ◮ accurate to be useful. Page 3 of 46 • We also need to use the records of the actual system’s Go Back behavior when making predictions. Full Screen • Using the previous behavior to predict the future is Close called machine learning . Quit

  4. A Short Introduction 3. Deep Learning Machine Learning Is . . . Deep Learning • The most efficient machine learning technique is deep Shall We Go Beyond . . . learning : the use of multi-layer neural networks. Which . . . • In general, on a layer of a neural network, we transform Invariance � n � signals x 1 , . . . , x n into a new signal y = s � w i · x i + w 0 . Traditional Neural . . . i =1 This Leads Exactly to . . . • The coefficient w i (called weights ) are to be determined Home Page during training. Title Page • s ( z ) is a non-linear function called activation function . ◭◭ ◮◮ • Most multi-layer neural networks use s ( z ) = max( z, 0) ◭ ◮ known as rectified linear function. Page 4 of 46 Go Back Full Screen Close Quit

  5. A Short Introduction 4. Shall We Go Beyond Rectified Linear? Machine Learning Is . . . Deep Learning • Preliminary analysis shows that for some applications: Shall We Go Beyond . . . – it is more advantageous to use different activation Which . . . functions for different neurons; Invariance – specifically, this was shown for a special family of Traditional Neural . . . squashing activation functions This Leads Exactly to . . . Home Page λ · β · ln 1 + exp( β · z − ( a − λ/ 2)) 1 S ( β ) a,λ ( z ) = 1 + exp( β · z − ( a + λ/ 2)); Title Page ◭◭ ◮◮ – this family contains rectified linear neurons as a particular case. ◭ ◮ • We explain their empirical success of squashing func- Page 5 of 46 tions by showing that: Go Back – their formulas Full Screen – follow from reasonably natural symmetries. Close Quit

  6. A Short Introduction 5. How This Talk Is Structured Machine Learning Is . . . Deep Learning • First, we recall the main ideas of symmetries and in- Shall We Go Beyond . . . variance. Which . . . • Then, we recall how these ideas can be used to explain Invariance the efficiency of the sigmoid activation function Traditional Neural . . . 1 This Leads Exactly to . . . s 0 ( z ) = 1 + exp( − z ) . Home Page Title Page • This function is used in the traditional 3-layer neural ◭◭ ◮◮ networks. ◭ ◮ • Finally, we use this information to explain the effi- ciency of squashing activation functions. Page 6 of 46 Go Back Full Screen Close Quit

  7. A Short Introduction 6. Which Transformations Are Natural? Machine Learning Is . . . Deep Learning • From the mathematical viewpoint, we can apply any Shall We Go Beyond . . . non-linear transformation. Which . . . • However, some of these transformations are purely math- Invariance ematical, with no clear physical interpretation. Traditional Neural . . . This Leads Exactly to . . . • Other transformation are natural in the sense that they Home Page have physical meaning. Title Page • What are natural transformations? ◭◭ ◮◮ ◭ ◮ Page 7 of 46 Go Back Full Screen Close Quit

  8. A Short Introduction 7. Numerical Values Change When We Change a Machine Learning Is . . . Measuring Unit And/Or Starting Point Deep Learning Shall We Go Beyond . . . • In data processing, we deal with numerical values of Which . . . different physical quantities. Invariance • Computers just treat these values as numbers. Traditional Neural . . . This Leads Exactly to . . . • However, from the physical viewpoint, the numerical Home Page values are not absolute; they change: Title Page – if we change the measuring unit and/or ◭◭ ◮◮ – the starting point for measuring the corresponding quantity. ◭ ◮ • The corresponding changes in numerical values are clearly Page 8 of 46 physically meaningful, i.e., natural. Go Back • For example, we can measure a person’s height in me- Full Screen ters or in centimeters. Close Quit

  9. A Short Introduction 8. Numerical Values Change (cont-d) Machine Learning Is . . . Deep Learning • The same height of 1.7 m, when described in centime- Shall We Go Beyond . . . ters, becomes 170 cm. Which . . . • In general, if we replace the original measuring unit Invariance with a new unit which is λ times smaller, then: Traditional Neural . . . – instead of the original numerical value x , This Leads Exactly to . . . Home Page – we get a new numerical value λ · x – while the actual quantity remains the same. Title Page • Such a transformation x → λ · x is known as scaling . ◭◭ ◮◮ ◭ ◮ • For some quantities, e.g., for time or temperature, the numerical value also depends on the starting point. Page 9 of 46 • For example, we can measure the time from the mo- Go Back ment when the talk started. Full Screen • Alternatively, we can use the usual calendar time, in Close which Year 0 is the starting point. Quit

  10. A Short Introduction 9. Numerical Values Change (cont-d) Machine Learning Is . . . Deep Learning • In general, if we replace the original starting point with Shall We Go Beyond . . . the new one which is x 0 units earlier, than: Which . . . – each original numerical value x Invariance – is replaced by a new numerical value x + x 0 . Traditional Neural . . . This Leads Exactly to . . . • Such a transformation x → x + x 0 is known as shift . Home Page • In general, if we change both the measuring unit and Title Page the starting point, we get a linear transformation: ◭◭ ◮◮ x → λ · x + x 0 . ◭ ◮ • A usual example of such a transformation is a transi- Page 10 of 46 tion from Celsius to Fahrenheit temperature scales: Go Back t F = 1 . 8 · t C + 32 . Full Screen Close Quit

  11. A Short Introduction 10. Invariance Machine Learning Is . . . Deep Learning • Changing the measuring unit and/or starting point: Shall We Go Beyond . . . – changes the numerical values but Which . . . – does not change the actual quantity. Invariance Traditional Neural . . . • It is therefore reasonable to require that physical equa- tions do not change if we simply: This Leads Exactly to . . . Home Page – change the measuring unit and/or Title Page – change the starting point. ◭◭ ◮◮ • Of course, to preserve the physical equations: ◭ ◮ – if we change the measuring unit and/or starting point for one quantity, Page 11 of 46 – we may need to change the measuring units and/or Go Back starting points for other quantities as well. Full Screen • For example, there is a well-known relation d = v · t Close between distance d , velocity v , and time t . Quit

Recommend


More recommend