Why Traditional . . . How the Need for Fast . . . Faster Differentiation: . . . Beyond Traditional NN From Traditional Neural From Traditional NN . . . Networks to Deep Learning Need to Go Beyond . . . Constraints Are . . . and Beyond Carnegie-Mellon Idea New Idea: Details Vladik Kreinovich Home Page Title Page Department of Computer Science University of Texas at El Paso ◭◭ ◮◮ El Paso, TX 79968, USA ◭ ◮ vladik@utep.edu http://www.cs.utep.edu/vladik Page 1 of 38 (Based on joint work with Chitta Baral, Go Back also with Olac Fuentes and Francisco Zapata) Full Screen Close Quit
Why Traditional . . . How the Need for Fast . . . 1. Why Traditional Neural Networks: Faster Differentiation: . . . (Sanitized) History Beyond Traditional NN • How do we make computers think? From Traditional NN . . . Need to Go Beyond . . . • To make machines that fly it is reasonable to look at Constraints Are . . . the creatures that know how to fly: the birds. Carnegie-Mellon Idea • To make computers think, it is reasonable to analyze New Idea: Details how we humans think. Home Page • On the biological level, our brain processes information Title Page via special cells called ]it neurons. ◭◭ ◮◮ • Somewhat surprisingly, in the brain, signals are electric ◭ ◮ – just as in the computer. Page 2 of 38 • The main difference is that in a neural network, signals Go Back are sequence of identical pulses. Full Screen Close Quit
Why Traditional . . . How the Need for Fast . . . 2. Why Traditional NN: (Sanitized) History Faster Differentiation: . . . • The intensity of a signal is described by the frequency Beyond Traditional NN of pulses. From Traditional NN . . . Need to Go Beyond . . . • A neuron has many inputs (up to 10 4 ). Constraints Are . . . • All the inputs x 1 , . . . , x n are combined, with some loss, Carnegie-Mellon Idea n into a frequency � w i · x i . New Idea: Details i =1 Home Page • Low inputs do not active the neuron at all, high inputs Title Page lead to largest activation. ◭◭ ◮◮ • The output signal is a non-linear function � n � ◭ ◮ � y = f w i · x i − w 0 . Page 3 of 38 i =1 Go Back • In biological neurons, f ( x ) = 1 / (1 + exp( − x )) . Full Screen • Traditional neural networks emulate such biological neurons. Close Quit
Why Traditional . . . How the Need for Fast . . . 3. Why Traditional Neural Networks: Faster Differentiation: . . . Real History Beyond Traditional NN • At first, researchers ignored non-linearity and only From Traditional NN . . . used linear neurons. Need to Go Beyond . . . Constraints Are . . . • They got good results and made many promises. Carnegie-Mellon Idea • The euphoria ended in the 1960s when MIT’s Marvin New Idea: Details Minsky and Seymour Papert published a book. Home Page • Their main result was that a composition of linear func- Title Page tions is linear (I am not kidding). ◭◭ ◮◮ • This ended the hopes of original schemes. ◭ ◮ • For some time, neural networks became a bad word. Page 4 of 38 • Then, smart researchers came us with a genius idea: Go Back let’s make neurons non-linear. Full Screen • This revived the field. Close Quit
Why Traditional . . . How the Need for Fast . . . 4. Traditional Neural Networks: Main Motivation Faster Differentiation: . . . • One of the main motivations for neural networks was Beyond Traditional NN that computers were slow. From Traditional NN . . . Need to Go Beyond . . . • Although human neurons are much slower than CPU, Constraints Are . . . the human processing was often faster. Carnegie-Mellon Idea • So, the main motivation was to make data processing New Idea: Details faster. Home Page • The idea was that: Title Page – since we are the result of billion years of ever im- ◭◭ ◮◮ proving evolution, ◭ ◮ – our biological mechanics should be optimal (or close Page 5 of 38 to optimal). Go Back Full Screen Close Quit
Why Traditional . . . How the Need for Fast . . . 5. How the Need for Fast Computation Leads to Faster Differentiation: . . . Traditional Neural Networks Beyond Traditional NN • To make processing faster, we need to have many fast From Traditional NN . . . processing units working in parallel. Need to Go Beyond . . . Constraints Are . . . • The fewer layers, the smaller overall processing time. Carnegie-Mellon Idea • In nature, there are many fast linear processes – e.g., New Idea: Details combining electric signals. Home Page • As a result, linear processing (L) is faster than non- Title Page linear one. ◭◭ ◮◮ • For non-linear processing, the more inputs, the longer ◭ ◮ it takes. Page 6 of 38 • So, the fastest non-linear processing (NL) units process just one input. Go Back Full Screen • It turns out that two layers are not enough to approx- imate any function. Close Quit
Why Traditional . . . How the Need for Fast . . . 6. Why One or Two Layers Are Not Enough Faster Differentiation: . . . • With 1 linear (L) layer, we only get linear functions. Beyond Traditional NN From Traditional NN . . . • With one nonlinear (NL) layer, we only get functions Need to Go Beyond . . . of one variable. � n Constraints Are . . . � • With L → NL layers, we get g � w i · x i − w 0 . Carnegie-Mellon Idea i =1 New Idea: Details • For these functions, the level sets f ( x 1 , . . . , x n ) = const Home Page n � are planes w i · x i = c . Title Page i =1 • Thus, they cannot approximate, e.g., f ( x 1 , x 2 ) = x 1 · x 2 ◭◭ ◮◮ for which the level set is a hyperbola. ◭ ◮ n • For NL → L layers, we get f ( x 1 , . . . , x n ) = � f i ( x i ). Page 7 of 38 i =1 Go Back ∂ 2 f def • For all these functions, d = = 0, so we also Full Screen ∂x 1 ∂x 2 cannot approximate f ( x 1 , x 2 ) = x 1 · x 2 with d = 1 � = 0. Close Quit
Why Traditional . . . How the Need for Fast . . . 7. Why Three Layers Are Sufficient: Faster Differentiation: . . . Newton’s Prism and Fourier Transform Beyond Traditional NN • In principle, we can have two 3-layer configurations: From Traditional NN . . . L → NL → L and NL → L → NL. Need to Go Beyond . . . Constraints Are . . . • Since L is faster than NL, the fastest is L → NL → L: � n Carnegie-Mellon Idea K � � � y = W k · f k w ki · x i − w k 0 − W 0 . New Idea: Details Home Page k =1 i =1 • Newton showed that a prism decomposes while light Title Page (or any light) into elementary colors. ◭◭ ◮◮ • In precise terms, elementary colors are sinusoids ◭ ◮ A · sin( w · t ) + B · cos( w · t ) . Page 8 of 38 • Thus, every function can be approximated, with any Go Back accuracy, as a linear combination of sinusoids: Full Screen � f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Close k Quit
Why Traditional . . . How the Need for Fast . . . 8. Why Three Layers Are Sufficient (cont-d) Faster Differentiation: . . . • Newton’s prism result: Beyond Traditional NN � From Traditional NN . . . f ( x 1 ) ≈ ( A k · sin( w k · x 1 ) + B k · cos( w k · x 1 )) . Need to Go Beyond . . . k Constraints Are . . . • This result was theoretically proven later by Fourier. Carnegie-Mellon Idea • For f ( x 1 , x 2 ), we get a similar expression for each x 2 , New Idea: Details with A k ( x 2 ) and B k ( x 2 ). Home Page • We can similarly represent A k ( x 2 ) and B k ( x 2 ), thus Title Page getting products of sines, and it is known that, e.g.: ◭◭ ◮◮ cos( a ) · cos( b ) = 1 2 · (cos( a + b ) + cos( a − b )) . ◭ ◮ • Thus, we get an approximation of the desired form with Page 9 of 38 f k = sin or f k = cos: Go Back � n K � � � Full Screen y = W k · f k w ki · x i − w k 0 . i =1 k =1 Close Quit
Why Traditional . . . How the Need for Fast . . . 9. Which Activation Functions f k ( z ) Should We Faster Differentiation: . . . Choose Beyond Traditional NN • A general 3-layer NN has the form: From Traditional NN . . . � n K � Need to Go Beyond . . . � � y = W k · f k w ki · x i − w k 0 − W 0 . Constraints Are . . . i =1 k =1 Carnegie-Mellon Idea • Biological neurons use f ( z ) = 1 / (1 + exp( − z )), but New Idea: Details shall we simulate it? Home Page • Simulations are not always efficient. Title Page • E.g., airplanes have wings like birds but they do not ◭◭ ◮◮ flap them. ◭ ◮ • Let us analyze this problem theoretically. Page 10 of 38 • There is always some noise c in the communication Go Back channel. Full Screen • So, we can consider either the original signals x i or Close denoised ones x i − c . Quit
Recommend
More recommend