Overview Traditional Neural . . . Why Go Beyond . . . Which Activation . . . Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values Sensitivity of Deep . . . Vladik Kreinovich How to Deal with . . . Department of Computer Science Home Page University of Texas at El Paso Title Page El Paso, Texas 79968, USA vladik@utep.edu ◭◭ ◮◮ http://www.cs.utep.edu/vladik ◭ ◮ Page 1 of 35 Go Back Full Screen Close Quit
Overview Traditional Neural . . . 1. Overview Why Go Beyond . . . • Successes of deep learning are partly due to appropriate Which Activation . . . selection of activation function, pooling functions, etc. Need for Pooling Which Pooling . . . • Most of these choices have been made based on empir- Pooling Four Values ical comparison and heuristic ideas. Sensitivity of Deep . . . • In this talk, we show that: How to Deal with . . . Home Page – many of these choices – and the surprising success of deep learning in the first place Title Page – can be explained by reasonably simple and natural ◭◭ ◮◮ mathematics. ◭ ◮ Page 2 of 35 Go Back Full Screen Close Quit
Overview Traditional Neural . . . 2. Traditional Neural Networks: A Brief Reminder Why Go Beyond . . . • To explain deep neural networks, let us first briefly Which Activation . . . recall the motivations behind traditional ones. Need for Pooling Which Pooling . . . • In the old days, computers were much slower. Pooling Four Values • This was a big limitation that prevented us from solv- Sensitivity of Deep . . . ing many important practical problems. How to Deal with . . . • As a result, researchers started looking for ways to Home Page speed up computations. Title Page • If a person has a task which takes too long for one ◭◭ ◮◮ person, a natural idea is to ask for help. ◭ ◮ • Several people can work on this task in parallel – and Page 3 of 35 thus, get the result faster; similarly: Go Back – if a computation task takes too long, Full Screen – a natural idea is to have several processing units working in parallel. Close Quit
Overview Traditional Neural . . . 3. Traditional Neural Networks (cont-d) Why Go Beyond . . . • In this case: Which Activation . . . Need for Pooling – the overall computation time is just Which Pooling . . . – the time that is needed for each of the processing Pooling Four Values unit to finish its sub-task. Sensitivity of Deep . . . • To minimize the overall time, it is therefore necessary How to Deal with . . . to make these sub-tasks as simple as possible. Home Page • In data processing, the simplest possible functions to Title Page compute are linear functions. ◭◭ ◮◮ • However, if we only have processing units that compute ◭ ◮ linear functions, we will only compute linear functions. Page 4 of 35 • Indeed, a composition of linear functions is always lin- Go Back ear. Full Screen • Thus, we need to supplement these units with some nonlinear units. Close Quit
Overview Traditional Neural . . . 4. Traditional Neural Networks (cont-d) Why Go Beyond . . . • In general, the more inputs, the more complex (and Which Activation . . . thus longer) the resulting computations. Need for Pooling Which Pooling . . . • So, the fastest possible nonlinear units are the ones Pooling Four Values that compute functions of one variable. Sensitivity of Deep . . . • So, our ideal computational device should consist of: How to Deal with . . . – linear (L) units and Home Page – nonlinear units (NL) that compute functions of one Title Page variable. ◭◭ ◮◮ • These units should work in parallel: ◭ ◮ – first, all the units from one layer will work, Page 5 of 35 – then all units from another layer, etc. Go Back • The fewer layers, the faster the resulting computations. Full Screen • One can prove that 1- and 2-layer schemes do not have a universal approximation property. Close Quit
Overview Traditional Neural . . . 5. Traditional Neural Networks (cont-d) Why Go Beyond . . . • One can also prove that 3-layer neurons already have Which Activation . . . this property. Need for Pooling Which Pooling . . . • There are two possible 3-layer schemes: L-NL-L and Pooling Four Values NL-L-NL. Sensitivity of Deep . . . • The first one is faster, since it uses slower nonlinear How to Deal with . . . units only once. Home Page • In this scheme, first, each unit from the first layer ap- Title Page plies a linear transformation to the inputs x 1 , . . . , x n : ◭◭ ◮◮ n � ◭ ◮ z k = w ki · x i − w k 0 . i =1 Page 6 of 35 • The values w ki are known as weights . Go Back Full Screen • In the next NL layer, these values are transformed into y k = s k ( y k ), for some nonlinear functions s k ( z ). Close Quit
Overview Traditional Neural . . . 6. Traditional Neural Networks (cont-d) Why Go Beyond . . . • Finally, in the last (L) layer, the values y k are linearly Which Activation . . . combined into the final result Need for Pooling � n Which Pooling . . . K K � � � � y = W k · y k − W 0 = W k · s k w ki · x i − w k 0 − W 0 . Pooling Four Values k =1 k =1 i =1 Sensitivity of Deep . . . How to Deal with . . . • This is exactly the formula that describes the tradi- Home Page tional neural network. Title Page • In the traditional neural network, usually, all the NL ◭◭ ◮◮ neurons compute the same function – sigmoid: 1 ◭ ◮ s k ( z ) = 1 + exp( − z ) . Page 7 of 35 Go Back Full Screen Close Quit
Overview Traditional Neural . . . 7. Why Go Beyond Traditional Neural Networks Why Go Beyond . . . • Traditional neural networks were invented when com- Which Activation . . . puters were reasonably slow. Need for Pooling Which Pooling . . . • This prevented computers from solving important prac- Pooling Four Values tical problems. Sensitivity of Deep . . . • For these computers, computation speed was the main How to Deal with . . . objective. Home Page • As we have just shown, this need led to what we know Title Page as traditional neural networks. ◭◭ ◮◮ • Nowadays, computers are much faster. ◭ ◮ • In most practical applications, speed is no longer the Page 8 of 35 main problem. Go Back • But the traditional neural networks: Full Screen – while fast, – have limited accuracy of their predictions. Close Quit
Overview Traditional Neural . . . 8. The More Models We Have, the More Accu- Why Go Beyond . . . rately We Can Approximate Which Activation . . . • As a result of training a neural network, we get the Need for Pooling values of some parameters for which Which Pooling . . . Pooling Four Values – the corresponding models Sensitivity of Deep . . . – provides the best approximation to the actual data. How to Deal with . . . • Let a denote the number of parameters. Home Page • Let b the number of bits representing each parameter. Title Page • Then, to represent all parameters, we need N = a · b ◭◭ ◮◮ bits. ◭ ◮ • Different models obtained from training can be de- Page 9 of 35 scribed by different N -bit sequences. • In general, for N bits, there are 2 N possible N -bit se- Go Back quences. Full Screen • Thus, we can have 2 N possible models. Close Quit
Overview Traditional Neural . . . 9. The More Models We Have (cont-d) Why Go Beyond . . . • In these terms, training simply means selecting one of Which Activation . . . these 2 N possible models. Need for Pooling Which Pooling . . . • If we have only one model to represent the actual de- Pooling Four Values pendence, this model will be a very lousy description. Sensitivity of Deep . . . • If we can have two models, we can have more accurate How to Deal with . . . approximations. Home Page • In general, the more models we have, the more accurate Title Page representation we can have. ◭◭ ◮◮ • We can illustrate this idea on the example of approxi- ◭ ◮ mating real numbers from the interval [0 , 1]. Page 10 of 35 • If we have only one model – e.g., the value x = 0 . 5, then Go Back we approximate every other number with accuracy 0.5. Full Screen Close Quit
Overview Traditional Neural . . . 10. The More Models We Have (cont-d) Why Go Beyond . . . • If we can have 10 models, then we can take 10 values Which Activation . . . 0.05, 0.15, . . . , 0.95. Need for Pooling Which Pooling . . . • The first value approximates all the numbers from the Pooling Four Values interval [0 , 0 . 1] with accuracy 0.05. Sensitivity of Deep . . . • The second value approximates all the numbers from How to Deal with . . . the interval [0 , 1 , 0 . 2] with the same accuracy, etc. Home Page • By selecting one of these values, we can approximate Title Page any number from [0 , 1] with accuracy 0.05. ◭◭ ◮◮ ◭ ◮ Page 11 of 35 Go Back Full Screen Close Quit
Recommend
More recommend