orthogonal bases are the
play

Orthogonal Bases Are the Towards Formulating . . . Best: A Theorem - PowerPoint PPT Presentation

Neural Networks: . . . Apollonis Idea Why Symmetries? Symmetries Explain . . . Orthogonal Bases Are the Towards Formulating . . . Best: A Theorem Justifying How to Describe . . . Kahrunen-Loeve (KL) . . . Bruno Apollonis Heuristic


  1. Neural Networks: . . . Apolloni’s Idea Why Symmetries? Symmetries Explain . . . Orthogonal Bases Are the Towards Formulating . . . Best: A Theorem Justifying How to Describe . . . Kahrunen-Loeve (KL) . . . Bruno Apolloni’s Heuristic Proof of the Main Result Conclusions Neural Network Idea Home Page Jaime Nava and Vladik Kreinovich Title Page ◭◭ ◮◮ Department of Computer Science University of Texas at El Paso ◭ ◮ 500 W. University El Paso, TX 79968, USA Page 1 of 18 Emails: jenava@miners.utep.edu, Go Back vladik@utep.edu Full Screen Close Quit

  2. Neural Networks: . . . Apolloni’s Idea 1. Neural Networks: Brief Reminder Why Symmetries? • In the traditional (3-layer) neural networks, the input Symmetries Explain . . . values x 1 , . . . , x n : Towards Formulating . . . How to Describe . . . – first go through the non-linear layer of “hidden” Kahrunen-Loeve (KL) . . . neurons, resulting in the values � n Proof of the Main Result � � y i = s 0 w ij · x j − w i 0 1 ≤ i ≤ m, Conclusions Home Page j =1 – after which a linear neuron combines the results y i Title Page m into the output y = � W i · y i − W 0 . ◭◭ ◮◮ i =1 ◭ ◮ • Here, W i and w ij are weights selected based on the data, and s 0 ( z ) is a non-linear activation function . Page 2 of 18 Go Back • Usually, the “sigmoid” activation function is used: 1 Full Screen s 0 ( z ) = 1 + exp( − z ) . Close Quit

  3. Neural Networks: . . . Apolloni’s Idea 2. Training a Neural Network: Reminder Why Symmetries? • The weights W i and w ij are selected so as to fit the Symmetries Explain . . . data, i.e., that Towards Formulating . . . y ( k ) ≈ f � � x ( k ) How to Describe . . . 1 , . . . , x ( k ) , where: n Kahrunen-Loeve (KL) . . . • x ( k ) 1 , . . . , x ( k ) (1 ≤ k ≤ N ) are given values of the Proof of the Main Result n inputs, and Conclusions • y ( k ) are given values of the output. Home Page Title Page • One of the problems with the traditional neural net- works is that ◭◭ ◮◮ – in the process of learning – i.e., in the process of ◭ ◮ adjusting the values of the weights to fit the data – Page 3 of 18 – some of the neurons are duplicated, i.e., we get w ij = w i ′ j for some i � = i ′ and thus, y i = y i ′ . Go Back Full Screen • As a result, we do not fully use the learning capacity of a neural network: we could use fewer hidden neurons. Close Quit

  4. Neural Networks: . . . Apolloni’s Idea 3. Apolloni’s Idea Why Symmetries? • Problem (reminder): Symmetries Explain . . . Towards Formulating . . . – in the process of learning – i.e., in the process of How to Describe . . . adjusting the values of the weights to fit the data – Kahrunen-Loeve (KL) . . . – some of the neurons are duplicated, i.e., we get w ij = w i ′ j for some i � = i ′ and thus, y i = y i ′ . Proof of the Main Result Conclusions • To avoid this problem, B. Apolloni et al. suggested that Home Page we orthogonalize the neurons during training. Title Page • In other words, we make sure that the corresponding ◭◭ ◮◮ functions y i ( x 1 , . . . , x n ) remain orthogonal: ◭ ◮ � � y i , y j � = y i ( x ) · y j ( x ) dx = 0 . Page 4 of 18 Go Back • Since Apolloni et al. idea works well, it is desirable to look for its precise mathematical justification. Full Screen • We provide such a justification in terms of symmetries. Close Quit

  5. Neural Networks: . . . Apolloni’s Idea 4. Why Symmetries? Why Symmetries? • At first glance, the use of symmetries in neural net- Symmetries Explain . . . works may sound somewhat strange. Towards Formulating . . . How to Describe . . . • Indeed, there are no explicit symmetries there. Kahrunen-Loeve (KL) . . . • However, as we will show, hidden symmetries have been Proof of the Main Result actively used in neural networks. Conclusions Home Page • For example, symmetries explain the empirically ob- served advantages of the sigmoid activation function Title Page 1 ◭◭ ◮◮ s 0 ( z ) = 1 + exp( − z ) . ◭ ◮ Page 5 of 18 Go Back Full Screen Close Quit

  6. Neural Networks: . . . Apolloni’s Idea 5. Symmetry: a Fundamental Property of the Phys- Why Symmetries? ical World Symmetries Explain . . . • One of the main objectives of science: prediction. Towards Formulating . . . How to Describe . . . • Basis for prediction: we observed similar situations in Kahrunen-Loeve (KL) . . . the past, and we expect similar outcomes. Proof of the Main Result • In mathematical terms: similarity corresponds to sym- Conclusions metry , and similarity of outcomes – to invariance. Home Page • Example: we dropped the ball, it fall down. Title Page • Symmetries: shift, rotation, etc. ◭◭ ◮◮ • In modern physics: theories are usually formulated in ◭ ◮ terms of symmetries (not diff. equations). Page 6 of 18 • Natural idea: let us use symmetry to describe uncer- Go Back tainty as well. Full Screen Close Quit

  7. Neural Networks: . . . Apolloni’s Idea 6. Basic Symmetries: Scaling and Shift Why Symmetries? • Typical situation: we deal with the numerical values of Symmetries Explain . . . a physical quantity. Towards Formulating . . . How to Describe . . . • Numerical values depend on the measuring unit. Kahrunen-Loeve (KL) . . . • Scaling: if we use a new unit which is λ times smaller, Proof of the Main Result numerical values are multiplied by λ : x → λ · x . Conclusions Home Page • Example: x meters = 100 · x cm. Title Page • Another possibility: change the starting point. ◭◭ ◮◮ • Shift: if we use a new starting point which is s units before, then x → x + s (example: time). ◭ ◮ Page 7 of 18 • Together, scaling and shifts form linear transforma- tions x → a · x + b . Go Back • Invariance: physical formulas should not depend on Full Screen the choice of a measuring unit or of a starting point. Close Quit

  8. Neural Networks: . . . Apolloni’s Idea 7. Basic Nonlinear Symmetries Why Symmetries? • Sometimes, a system also has nonlinear symmetries. Symmetries Explain . . . Towards Formulating . . . • If a system is invariant under f and g , then: How to Describe . . . – it is invariant under their composition f ◦ g , and Kahrunen-Loeve (KL) . . . – it is invariant under the inverse transformation f − 1 . Proof of the Main Result • In mathematical terms, this means that symmetries Conclusions Home Page form a group . Title Page • In practice, at any given moment of time, we can only store and describe finitely many parameters. ◭◭ ◮◮ • Thus, it is reasonable to restrict ourselves to finite- ◭ ◮ dimensional groups. Page 8 of 18 • Question (N. Wiener): describe all finite-dimensional Go Back groups that contain all linear transformations. Full Screen • Answer (for real numbers): all elements of this group Close are fractionally-linear x → ( a · x + b ) / ( c · x + d ) . Quit

  9. Neural Networks: . . . Apolloni’s Idea 8. Symmetries Explain the Choice of an Activa- Why Symmetries? tion Function Symmetries Explain . . . • What needs explaining: formula for the activation func- Towards Formulating . . . tion f ( x ) = 1 / (1 + e − x ). How to Describe . . . Kahrunen-Loeve (KL) . . . • A change in the input starting point: x → x + s . Proof of the Main Result • Reasonable requirement: the new output f ( x + s ) equiv- Conclusions alent to the f ( x ) mod. appropriate transformation. Home Page • Reminder: all appropriate transformations are frac- Title Page tionally linear. ◭◭ ◮◮ • Conclusion: f ( x + s ) = a ( s ) · f ( x ) + b ( s ) c ( s ) · f ( x ) + d ( s ) . ◭ ◮ Page 9 of 18 • Differentiating both sides by s and equating s to 0, we get a differential equation for f ( x ). Go Back • Its known solution is the sigmoid activation function – Full Screen which can thus be explained by symmetries. Close Quit

  10. Neural Networks: . . . Apolloni’s Idea 9. Towards Formulating the Problem in Precise Why Symmetries? Terms Symmetries Explain . . . • We select a basis e 0 ( x ) , e 1 ( x ) , . . . , e n ( x ) , . . . so that each Towards Formulating . . . f-n f ( x ) is represented as f ( x ) = � c i · e i ( x ); e.g.: How to Describe . . . i Kahrunen-Loeve (KL) . . . • Taylor series: e 0 ( x ) = 1, e 1 ( x ) = x , e 2 ( x ) = x 2 , . . . Proof of the Main Result • Fourier transform: e i ( x ) = sin( ω i · x ). Conclusions Home Page • We store c 0 , c 1 , . . . , instead of the original f-n f ( x ). Title Page • Criterion: e.g., smallest # of bits to store f ( x ) with given accuracy. ◭◭ ◮◮ • Observation: storing c i and − c i takes the same space. ◭ ◮ • Thus, changing one of e i ( x ) to e ′ Page 10 of 18 i ( x ) = − e i ( x ) does not change accuracy or storage space, so: Go Back • if e 0 ( x ) , . . . , e i − 1 ( x ) , e i ( x ) , e i +1 ( x ) , . . . is an opt. base, Full Screen • e 0 ( x ) , . . . , e i − 1 ( x ) , − e i ( x ) , e i +1 ( x ) , . . . is also optimal. Close Quit

Recommend


More recommend