Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it
Neuron basics
Neuron: real and simulated
A bit of history
From biology to models
Biological models? Careful with brain analogies: Many different types of neurons Dendrites can perform complex non-linear computations Synapses are not single weights but complex dynamical dynamical system Rate code may not be adequate
Single neuron classifier
Neuron and logistic classifier 1 ๐ ๐ = ๐ ๐ ๐ ๐ = 1 + ๐ โ(๐ฅ 0 + ๐ ๐ฅ ๐ ๐ฆ ๐ ) ๐ฆ 0 ๐ฅ 0 synapse ๐ฅ 0 ๐ฆ 0 axon from a neuron dendrite cell body x ๐ ๐ฅ ๐ ๐ฆ ๐ + ๐ ๐ฅ 1 ๐ฆ 1 ๐ง ๐ ๐ฅ ๐ ๐ฆ ๐ + ๐ ๐ output axon ๐ activation ๐ฅ ๐ ๐ฆ ๐ function Forward flow ๐ง x
Linking output to input How a change in the output (loss) affects the weights? ๐ฆ 0 ๐ฅ 0 synapse ๐ฅ 0 ๐ฆ 0 axon from a neuron dendrite cell body ๐ ๐ฅ ๐ ๐ฆ ๐ + ๐ ๐ฅ 1 ๐ฆ 1 ๐ง ) 2 ๐(๐, ๐ง) โ (๐ง โ ๐ ๐ฅ ๐ ๐ฆ ๐ + ๐ ๐ output axon ๐ activation ๐ฅ ๐ ๐ฆ ๐ function Backward flow ๐ง w
Linking output to input How a change in the output (loss) affects the weights? ๐ฆ 0 ๐ฅ 0 synapse ๐๐ ๐๐จ axon from a neuron ๐๐ ๐๐ฅ 0 ๐๐ dendrite ๐๐ ๐๐จ cell body ๐๐ ๐๐ ๐๐ ๐๐ฅ 1 ๐๐ ๐๐จ ๐ ๐๐ ๐๐ ๐๐ output axon ๐๐ ๐๐จ activation function ๐๐ ๐๐ฅ ๐ Backward flow w y
Activation function Ups 1) Easy analytical derivatives 2) Squashes numbers to range [0,1] 3) Biological interpretation as saturating ยซfiring rateยป of a neuron Downs 1 1 + ๐ โ๐ง(๐ ๐ ๐) = ๐ ๐ ๐ ๐ = ๐(๐) 1) Saturated neurons kill the gradients 2) Sigmoid output are not zero- centered
Sigmoid backpropagation Assume the input of a neuron is always positive. What about the gradient on ๐ ? ๐ ๐ฅ ๐ ๐ฆ ๐ + ๐ ๐ ๐ ๐ข+1 = ๐ ๐ข + ๐ผ ๐ ๐ Gradient is all positive or all negative!
Improving activation function Ups 1) Still analytical derivatives 2) Squashes numbers to range [-1,1] 3) Zero-centered! tanh ๐ฆ = ๐ ๐ฆ โ ๐ โ๐ฆ ๐ ๐ฆ + ๐ โ๐ฆ Downs 1) Saturated neurons kill the ๐ tanh ๐ฆ gradients = 1 โ ๐ข๐๐โ 2 (๐ฆ) ๐๐ฆ
Activation function 2 Rectifying Linear Unit: ReLU Ups 1) Does not saturate 2) Computationally efficient 3) Converges faster in practice ๐ ๐ฆ = max(0, ๐ฆ) Downs 1) What happens for x<0? ๐๐ ๐๐ฆ = 1 ๐ฆ > 0 0 ๐ฆ < 0
ReLU neuron killing
Activation function 3 Leaky ReLU Ups 1) Does not saturate 2) Computationally efficient 3) Converges faster in practice 4) Keep neurons alive! ๐ ๐ฆ = ๐ผ โ๐ฆ ๐ฝ๐ฆ + ๐ผ ๐ฆ ๐ฆ Downs ๐๐ ๐๐ฆ = 1 ๐ฆ > 0 ๐ฝ ๐ฆ < 0
Activation function 4 Ups 1) Does not saturate Maxout 2) Computationally efficient 3) Linear regime 4) Keeps neurons alive! 5) Generalizes ReLU and leaky ๐ ๐ฆ = max(๐ 1๐ ๐, ๐ 2๐ ๐) ReLU Downs 1) Is not a dot product 2) Doubles the parameters
Neural Networks: architecture
Neural Networks: architecture 2-layers Neural Network 3-layers Neural Network 1-hidden layer Neural Network 2-hidden layers Neural Network
Neural Networks: architecture Number of neurons? Number of weights? Number of parameters?
Neural Networks: architecture Number of neurons: 4+2=6 Number of weights: 4x3+2x4=20 Number of parameters: 20+6
Neural Networks: architecture Number of neurons: 4+2=6 Number of neurons: 4+4+1=9 Number of weights: 4x3+2x4=20 Number of weights: 4x3+4x4+1x4=32 Number of parameters: 20+6 Number of parameters: 32+9
Neural Networks: architecture Modern CNNs: ~10 million artificial neurons Human Visual Cortex: ~5 billion neurons
ANN representation ๐ฅ 0,๐ ๐ฅ 0,๐ ๐ฅ 1,๐ ๐ฅ 1,๐ ๐ฅ 2,๐ ๐ ๐,๐ = ๐ ๐,๐ = ๐ฅ 2,๐ ๐ฅ 3,๐ ๐ฅ 3,๐ ๐ฅ 4,๐ ๐ 1,1 ๐ 2,1 ๐ฆ 1 ๐ฆ 2 ๐ 2,2 ๐ฆ 3 ๐ 1,4 ๐ ๐ 1 1 ๐ฆ 11 โฏ ๐ฆ 13 ๐ โฏ ๐ฆ 23 1 ๐ฆ 21 ๐ 2 โฏ ๐ฆ 33 ๐ = = 1 ๐ฆ 31 ๐ ๐ 3 โฏ โฎ โฎ โฎ 1 ๐ฆ ๐1 โฏ ๐ฆ ๐3 ๐
ANN representation ๐ ๐ 1 1 ๐ฆ 11 โฏ ๐ฆ 13 ๐ โฏ ๐ฆ 23 1 ๐ฆ 21 ๐ 2 โฏ ๐ฆ 33 ๐ = 1 ๐ฆ 31 = ๐ ๐ 3 โฏ โฎ โฎ โฎ 1 ๐ฆ ๐1 โฏ ๐ฆ ๐3 ๐ ๐ ๐ ๐ ๐ ๐ = ๐ 1 ๐ฟ ๐ ๐ ๐ฅ 0,1 ๐ฅ 0,2 ๐ฅ 0,3 ๐ฅ 1,1 ๐ฅ 1,2 ๐ฅ 1,3 ๐ ๐ = ๐ 2 ๐ฟ ๐ ๐ 1 = ๐ 2 ๐ฟ 2 ๐ 1 ๐ฟ 1 ๐ = ๐ ๐,๐ ๐ ๐,๐ ๐ ๐,๐ ๐ฟ 1 = ๐ฅ 2,1 ๐ฅ 2,2 ๐ฅ 2,3 ๐ฅ 3,1 ๐ฅ 3,2 ๐ฅ 3,3 ๐ฟ 2 = ๐ฅ 1,2 ๐ฅ 2,2
ANN becoming popular
ANN training: forward flow Define a loss function ๐ ๐ ๐ ; ๐ง Each neuron computes: ๐ ๐ = ๐ ๐ฅ ๐๐ ๐จ ๐ And pass to the following layer: ๐จ ๐ = ๐(๐ ๐ )
ANN training: backward flow Need to compute: ๐๐ ๐ ๐๐ = ๐๐ = ๐ ๐ ๐จ ๐ ๐๐ฅ ๐๐ ๐ ๐๐ฅ ๐๐ ๐๐ Considering that for the output neurons: ๐ ๐ = ๐ง ๐ โ ๐ง ๐ We get: ๐ ๐ = ๐โฒ(๐ ๐ ) ๐ฅ ๐๐ ๐ ๐ ๐
ANN training: backpropagation Updating all weights: ๐๐ ๐ข+1 = ๐ฅ ๐๐ ๐ข + ๐๐ ๐ฅ ๐
ANN training ex: forward ๐ ๐ = tanh(๐) ๐ฟ ๐ ๐ = 1 ๐ง ๐ โ ๐ง ๐ 2 2 ๐=1 ๐ธ (1) ๐ฆ ๐ ๐ ๐ = ๐ฅ ๐๐ ๐=0 ๐จ ๐ = tanh(๐ ๐ ) ๐ (2) ๐จ ๐ง ๐ = ๐ฅ ๐๐ ๐ ๐=1
ANN training ex: backward ๐ ๐ = ๐ง ๐ โ ๐ง ๐ ๐ฟ ๐2 ) ๐ ๐ = (1 โ ๐จ ๐ฅ ๐๐ ๐ ๐ ๐=1 ๐๐ ๐ ๐๐(1) = ๐ ๐ ๐ฆ ๐ ๐๐ฅ ๐๐ ๐ ๐๐ฅ ๐๐ (2) = ๐ ๐ ๐จ ๐
What can ANN represent?
What can ANN classify?
Regularization
Recommend
More recommend