one more advantage of deep learning
play

One More Advantage of Deep Learning: From Traditional NN . . . - PowerPoint PPT Presentation

Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural


  1. Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN One More Advantage of Deep Learning: From Traditional NN . . . While in General, A Perfect Training Formulation of the . . . of a Neural Network Is NP-Hard, What Us Feasible: . . . It Is Feasible for Bounded-Width NP-Hardness Result Deep Networks Feasibility Resuly Home Page Vladik Kreinovich Title Page Department of Computer Science University of Texas at El Paso ◭◭ ◮◮ El Paso, TX 79968, USA vladik@utep.edu ◭ ◮ http://www.cs.utep.edu/vladik Page 1 of 29 (based on a joint work with Chitta Baral) Go Back Full Screen Close Quit

  2. Why Traditional . . . How the Need for Fast . . . 1. Why Traditional Neural Networks: Faster Differentiation: . . . (Sanitized) History Beyond Traditional NN • How do we make computers think? From Traditional NN . . . Formulation of the . . . • To make machines that fly it is reasonable to look at What Us Feasible: . . . the creatures that know how to fly: the birds. NP-Hardness Result • To make computers think, it is reasonable to analyze Feasibility Resuly how we humans think. Home Page • On the biological level, our brain processes information Title Page via special cells called neurons . ◭◭ ◮◮ • Somewhat surprisingly, in the brain, signals are electric ◭ ◮ – just as in the computer. Page 2 of 29 • The main difference is that in a neural network, signals Go Back are sequence of identical pulses. Full Screen Close Quit

  3. Why Traditional . . . How the Need for Fast . . . 2. Why Traditional NN: (Sanitized) History Faster Differentiation: . . . • The intensity of a signal is described by the frequency Beyond Traditional NN of pulses. From Traditional NN . . . Formulation of the . . . • A neuron has many inputs (up to 10 4 ). What Us Feasible: . . . • All the inputs x 1 , . . . , x n are combined, with some loss, NP-Hardness Result n into a frequency � w i · x i . Feasibility Resuly i =1 Home Page • Low inputs do not active the neuron at all, high inputs Title Page lead to largest activation. ◭◭ ◮◮ • The output signal is a non-linear function � n � ◭ ◮ � y = f w i · x i − w 0 . Page 3 of 29 i =1 Go Back • In biological neurons, f ( x ) = 1 / (1 + exp( − x )) . Full Screen • Traditional neural networks emulate such biological neurons. Close Quit

  4. Why Traditional . . . How the Need for Fast . . . 3. Why Traditional Neural Networks: Faster Differentiation: . . . Real History Beyond Traditional NN • At first, researchers ignored non-linearity and only From Traditional NN . . . used linear neurons. Formulation of the . . . What Us Feasible: . . . • They got good results and made many promises. NP-Hardness Result • The euphoria ended in the 1960s when MIT’s Marvin Feasibility Resuly Minsky and Seymour Papert published a book. Home Page • Their main result was that a composition of linear func- Title Page tions is linear (I am not kidding). ◭◭ ◮◮ • This ended the hopes of original schemes. ◭ ◮ • For some time, neural networks became a bad word. Page 4 of 29 • Then, smart researchers came us with a genius idea: Go Back let’s make neurons non-linear. Full Screen • This revived the field. Close Quit

  5. Why Traditional . . . How the Need for Fast . . . 4. Traditional Neural Networks: Main Motivation Faster Differentiation: . . . • One of the main motivations for neural networks was Beyond Traditional NN that computers were slow. From Traditional NN . . . Formulation of the . . . • Although human neurons are much slower than CPU, What Us Feasible: . . . the human processing was often faster. NP-Hardness Result • So, the main motivation was to make data processing Feasibility Resuly faster. Home Page • The idea was that: Title Page – since we are the result of billion years of ever im- ◭◭ ◮◮ proving evolution, ◭ ◮ – our biological mechanics should be optimal (or close Page 5 of 29 to optimal). Go Back Full Screen Close Quit

  6. Why Traditional . . . How the Need for Fast . . . 5. How the Need for Fast Computation Leads to Faster Differentiation: . . . Traditional Neural Networks Beyond Traditional NN • To make processing faster, we need to have many fast From Traditional NN . . . processing units working in parallel. Formulation of the . . . What Us Feasible: . . . • The fewer layers, the smaller overall processing time. NP-Hardness Result • In nature, there are many fast linear processes – e.g., Feasibility Resuly combining electric signals. Home Page • As a result, linear processing (L) is faster than non- Title Page linear one. ◭◭ ◮◮ • For non-linear processing, the more inputs, the longer ◭ ◮ it takes. Page 6 of 29 • So, the fastest non-linear processing (NL) units process just one input. Go Back Full Screen • It turns out that two layers are not enough to approx- imate any function. Close Quit

  7. Why Traditional . . . How the Need for Fast . . . 6. Why One or Two Layers Are Not Enough Faster Differentiation: . . . • With 1 linear (L) layer, we only get linear functions. Beyond Traditional NN From Traditional NN . . . • With one nonlinear (NL) layer, we only get functions Formulation of the . . . of one variable. � n What Us Feasible: . . . � • With L → NL layers, we get g � w i · x i − w 0 . NP-Hardness Result i =1 Feasibility Resuly • For these functions, the level sets f ( x 1 , . . . , x n ) = const Home Page n � are planes w i · x i = c . Title Page i =1 • Thus, they cannot approximate, e.g., f ( x 1 , x 2 ) = x 1 · x 2 ◭◭ ◮◮ for which the level set is a hyperbola. ◭ ◮ n • For NL → L layers, we get f ( x 1 , . . . , x n ) = � f i ( x i ). Page 7 of 29 i =1 Go Back ∂ 2 f def • For all these functions, d = = 0, so we also Full Screen ∂x 1 ∂x 2 cannot approximate f ( x 1 , x 2 ) = x 1 · x 2 with d = 1 � = 0. Close Quit

  8. Why Traditional . . . How the Need for Fast . . . 7. Why Three Layers Are Sufficient: Faster Differentiation: . . . Newton’s Prism and Fourier Transform Beyond Traditional NN • In principle, we can have two 3-layer configurations: From Traditional NN . . . L → NL → L and NL → L → NL. Formulation of the . . . What Us Feasible: . . . • Since L is faster than NL, the fastest is L → NL → L: � n NP-Hardness Result K � � � y = W k · f k w ki · x i − w k 0 − W 0 . Feasibility Resuly Home Page k =1 i =1 • Newton showed that a prism decomposes while light Title Page (or any light) into elementary colors. ◭◭ ◮◮ • In precise terms, elementary colors are sinusoids ◭ ◮ A · sin( w · t ) + B · cos( w · t ) . Page 8 of 29 • Thus, every function can be approximated, with any Go Back accuracy, as a linear combination of sinusoids: Full Screen � f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Close k Quit

  9. Why Traditional . . . How the Need for Fast . . . 8. Why Three Layers Are Sufficient (cont-d) Faster Differentiation: . . . • Newton’s prism result: Beyond Traditional NN � From Traditional NN . . . f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Formulation of the . . . k What Us Feasible: . . . • This result was theoretically proven later by Fourier. NP-Hardness Result • For f ( x 1 , x 2 ), we get a similar expression for each x 2 , Feasibility Resuly with A k ( x 2 ) and B k ( x 2 ). Home Page • We can similarly represent A k ( x 2 ) and B k ( x 2 ), thus Title Page getting products of sines, and it is known that, e.g.: ◭◭ ◮◮ cos( a ) · cos( b ) = 1 2 · (cos( a + b ) + cos( a − b )) . ◭ ◮ • Thus, we get an approximation of the desired form with Page 9 of 29 f k = sin or f k = cos: Go Back � n K � � � Full Screen y = W k · f k w ki · x i − w k 0 . i =1 k =1 Close Quit

  10. Why Traditional . . . How the Need for Fast . . . 9. Which Activation Functions f k ( z ) Should We Faster Differentiation: . . . Choose Beyond Traditional NN • A general 3-layer NN has the form: From Traditional NN . . . � n K � Formulation of the . . . � � y = W k · f k w ki · x i − w k 0 − W 0 . What Us Feasible: . . . i =1 k =1 NP-Hardness Result • Biological neurons use f ( z ) = 1 / (1 + exp( − z )), but Feasibility Resuly shall we simulate it? Home Page • Simulations are not always efficient. Title Page • E.g., airplanes have wings like birds but they do not ◭◭ ◮◮ flap them. ◭ ◮ • Let us analyze this problem theoretically. Page 10 of 29 • There is always some noise c in the communication Go Back channel. Full Screen • So, we can consider either the original signals x i or Close denoised ones x i − c . Quit

Recommend


More recommend