NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS - PowerPoint PPT Presentation

NEURAL NETWORKS

NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS ▸ Initially a simplified model of real neurons ▸ A real neuron has inputs from other neurons through synapses on its dendrites ▸ The inputs of a real neural are weighted! Due to the position of synapses (distance from the soma), and the properties of the dendrites ▸ A real neuron sums the inputs on its soma ( voltages are summed ) ▸ A real neuron has a threshold for firing: non-linear activation!

NEURAL NETWORKS THE MATH BEHIND ARTIFICIAL NEURONS ▸ One artificial neuron for classification is very similar to logistic regression ▸ One artificial neuron performs linear separation ▸ How does this become interesting? ▸ SVM, kernel trick: project to high y = f ( ∑ dimensional space where linear w i x i + b ) separation can solve the problem ▸ Neurons: Follow the brain and i use more neurons connected to each other: neural network!

NEURAL NETWORKS NEURAL NETWORKS ▸ Fully connected models, mostly of theoretical interest (Hopfield network, Boltzmann Machine) ▸ Supervised machine learning, function approximation: feed forward neural networks ▸ Organise neurons into layers. The input of a neuron in a layer is the output of neuron from the previous layer ▸ First layer is X, last is y

NEURAL NETWORKS NEURAL NETWORKS ▸ Note: linear activations reduce the network to a linear model! ▸ Popular non-linear activations: ▸ Sigmoid, tanh functions, ReLU ▸ A layer is a new representation of the data! ▸ New space with #neuron dimensions ▸ Iterative internal representations, in order to make the input data linearly separable by the very last layer! ▸ Slightly mysterious machinery!

NEURAL NETWORKS TRAINING NEURAL NETWORKS ▸ Loss functions just as before (MSE, Cross entropy) ▸ L(y, y_pred) ▸ A neural network is a function composition ▸ Input: x ▸ Activations in first layer: f(x) ▸ Activations in 2nd layer: g(f(x)) ▸ Etc: -> L(y, h(g(f(x))) ) ▸ NN is differentiable -> Gradient optimisation! ▸ Loss function can be derived with respect to the weight parameters

NEURAL NETWORKS TRAINING NEURAL NETWORKS ▸ Activations are known from a forward pass ! ▸ Let’s consider weights of neuron with index i in an arbitrary layer (j denotes the index of neurons in the previous layer) ▸ Derivation with respect to weights becomes X o i = K ( s i ) = K ( w ij o j + b i ) derivation with respect to activations ▸ For the last layer we are done, for previous ones, ∂ E = ∂ E ∂ o i ∂ s i = ∂ E the loss function depends on an activation only K 0 ( s i ) o j through activations in the next layer. With the ∂ w ij ∂ o i ∂ s i ∂ w ij ∂ o i chain rule we get a recursive formula ∂ E ∂ o l ∂ E ∂ o l ∂ s l ∂ E ▸ Last layer is given, previous layer can be X X X K 0 ( s l ) w li = = = ∂ o l ∂ o i ∂ o l ∂ s l ∂ o i ∂ o l calculated from the next layer, and so on! l 2 R l 2 R l 2 R ▸ Local calculations: only need to keep track 2 values per neuron: activation, and a “diff” ▸ Backward pass .

NEURAL NETWORKS TRAINING NEURAL NETWORKS ▸ Both forward and backwards passes are highly parallelizable ▸ GPU, TPU accelerators X o i = K ( s i ) = K ( w ij o j + b i ) ▸ Backward connections do not allow the third line, no easy recursive ∂ E = ∂ E ∂ o i ∂ s i = ∂ E K 0 ( s i ) o j formula ∂ w ij ∂ o i ∂ s i ∂ w ij ∂ o i ∂ E ∂ o l ∂ E ∂ o l ∂ s l ∂ E ▸ (Backprop through time for X X X = = = K 0 ( s l ) w li ∂ o l ∂ o i ∂ o l ∂ s l ∂ o i ∂ o l recurrent networks with sequence l 2 R l 2 R l 2 R inputs) ▸ Skip connections are handled! E.g.: It’s simply an identity neuron in a layer.

NEURAL NETWORKS TRAINING NEURAL NETWORKS ▸ Instead of full gradient, stochastic gradient (SGD): Gradient is only calculated from a few examples - a minibatch - at a time (1-512 samples usually) ▸ 1 full pass over the whole training dataset is called an epoch ▸ Stochasticity: order of data points. Shuffled in each epoch, to reach better solution. ▸ Note: use permutations of data, not random sampling, in order to use the whole dataset for learning in the best way! ▸ Note: online training, can easily handle unlimited data !

NEURAL NETWORKS TRAINING NEURAL NETWORKS ▸ How to chose initial parameters? ▸ Full 0? Each weight has the same,   and not meaningful gradients. Random! ▸ Uniform or Gauss? Both Ok. ▸ Mean? 0 ▸ Scale? ▸ Avoid exploding passes, (forward backward too) ▸ ReLU: grad(x) = x (if not 0) ▸ variance: 2/(fan_in + fan_out) ▸ Even in 2014 they trained a 16 layer neural networks with layer- wise pre training, because of exploding gradients. Then they realised these simple schemes allow training from scratch!

NEURAL NETWORKS REGULARISATION IN NEURAL NETWORKS, EARLY STOPPING ▸ Neural networks with many units and layers can easily memorise any data ▸ (modern image recognition networks can memorise 1.2 million, 224x224 pixel size, fully random noise images) ▸ L2 penalty of weights can be useful but still! ▸ How long should we train? “Convergence” is often 0 error on training data, fully memorised. ▸ Early stopping: Train-val-test splits, and stop training when error on validation does not improve. (Train-test only split will “overfit” the test data)! ▸ Early stopping is a regularisation! It does not improve training accuracy, but it does improve testing accuracy. It is essentially a limit, how far we can wander from the random initial parameter point.

REFERENCES REFERENCES ▸ ESL chapter 11. ▸ Deep learning book https://www.deeplearningbook.org

NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS - PowerPoint PPT Presentation

NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS Initially a simplified model of real neurons A real neuron has inputs from other neurons through synapses on its dendrites The inputs of a real neural are weighted!

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Feature Selection Richard Pospesel and Bert Wierenga Introduction Preprocessing Peaking

Threshold Networks over undirected graphs Universidad Adolfo

CSC421 Intro to Artificial Intelligence UNIT 32: Instance-based Learning and Neural Networks

Artificial Neural Network : Architectures Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab

POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie

Algorithmic Learning Theory Theoretical Computer Science Peter Rossmanith Felix Reidl, Fernando

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533