Neural Networks: Old and New Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities January 29, 2020 1 / 32
Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) 2 / 32
Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) – Homework 0 will be posted tonight 2 / 32
Logistics – Another great reference: Dive into Deep Learning by Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Livebook online: https://d2l.ai/ (comprehensive coverage of recent developments and detailed implementations based on NumPy) – Homework 0 will be posted tonight – Waiting list 2 / 32
Outline Start from neurons Shallow to deep neural networks A brief history of AI Suggested reading 3 / 32
Model of biological neurons Credit: Stanford CS231N 4 / 32
Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites 4 / 32
Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites – Each neuron outputs signals via its single axon 4 / 32
Model of biological neurons Credit: Stanford CS231N Biologically ... – Each neuron receives signals from its dendrites – Each neuron outputs signals via its single axon – The axon branches out and connects via synapese to dendrites of other neurons 4 / 32
Model of biological neurons Credit: Stanford CS231N 5 / 32
Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites 5 / 32
Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i 5 / 32
Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i – The neuron fires only when the combined signal is above a certain threshold: � i w i x i + b 5 / 32
Model of biological neurons Credit: Stanford CS231N Mathematically ... – Each neuron receives x i ’s from its dendrites – x i ’s weighted by w i ’s (synaptic strengths) and summed � i w i x i – The neuron fires only when the combined signal is above a certain threshold: � i w i x i + b – Fire rate is modeled by an activation function f , i.e., outputting f ( � i w i x i + b ) 5 / 32
Artificial neural networks Brain neural networks Credit: Max Pixel 6 / 32
Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel 6 / 32
Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? 6 / 32
Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? – (Over-)simplification on neural level – (Over-)simplification on connection level 6 / 32
Artificial neural networks Brain neural networks Artificial neural networks Credit: Max Pixel Why called artificial ? – (Over-)simplification on neural level – (Over-)simplification on connection level In this course, neural networks are always artificial. 6 / 32
Outline Start from neurons Shallow to deep neural networks A brief history of AI Suggested reading 7 / 32
Artificial neurons 8 / 32
Artificial neurons �� � w i x i + b = f ( w ⊺ x + b ) f i 8 / 32
Artificial neurons �� � w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. 8 / 32
Artificial neurons Examples of activation function σ �� � w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. 8 / 32
Artificial neurons Examples of activation function σ �� � w i x i + b = f ( w ⊺ x + b ) f i We shall use σ instead of f henceforth. Credit: [Hughes and Correll, 2016] 8 / 32
Neural networks One neuron: σ ( w ⊺ x + b ) 9 / 32
Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons 9 / 32
Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons w ’s and b ’s are unknown and need to be learned 9 / 32
Neural networks One neuron: σ ( w ⊺ x + b ) Neural networks (NN): structured organization of artificial neurons w ’s and b ’s are unknown and need to be learned Many models in machine learning are neural networks 9 / 32
A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) 10 / 32
A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i 10 / 32
A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality 10 / 32
A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality – Find an f ∈ H to minimize the average loss n 1 � min ℓ ( y i , f ( x i )) n f ∈H i =1 10 / 32
A typical setup Supervised Learning – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a family of functions, e.g., H , so that there is f ∈ H to ensure y i ≈ f ( x i ) for all i – Set up a loss function ℓ to measure the approximation quality – Find an f ∈ H to minimize the average loss n 1 � min ℓ ( y i , f ( x i )) n f ∈H i =1 ... known as empirical risk minimization (ERM) framework in learning theory 10 / 32
A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) 11 / 32
A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i 11 / 32
A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i – Set up a loss function ℓ to measure the approximation quality 11 / 32
A typical setup Supervised Learning from NN viewpoint – Gather training data ( x 1 , y 1 ) , . . . , ( x n , y n ) – Choose a NN with k neurons, so that there is a group of weights, e.g., ( w 1 , . . . , w k , b 1 , . . . , b k ) , to ensure y i ≈ { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i ) ∀ i – Set up a loss function ℓ to measure the approximation quality – Find weights ( w 1 , . . . , w k , b 1 , . . . , b k ) to minimize the average loss n 1 � min ℓ [ y i , { NN ( w 1 , . . . , w k , b 1 , . . . , b k ) } ( x i )] n w ′ s,b ′ s i =1 11 / 32
Linear regression Credit: D2L 12 / 32
Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d Credit: D2L 12 / 32
Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b Credit: D2L 12 / 32
Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 Credit: D2L 12 / 32
Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 – Optimization: n 1 � � y i − ( w ⊺ x i + b ) � 2 min 2 n w ,b Credit: D2L i =1 12 / 32
Linear regression – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d – Model: y i ≈ w ⊺ x i + b y � 2 – Loss: � y − ˆ 2 – Optimization: n 1 � � y i − ( w ⊺ x i + b ) � 2 min 2 n w ,b Credit: D2L i =1 Credit: D2L σ is the identity function 12 / 32
Perceptron Frank Rosenblatt (1928–1971) 13 / 32
Perceptron – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d , y i ∈ { +1 , − 1 } Frank Rosenblatt (1928–1971) 13 / 32
Perceptron – Data: ( x 1 , y 1 ) , . . . , ( x n , y n ) , x i ∈ R d , y i ∈ { +1 , − 1 } – Model: y i ≈ σ ( w ⊺ x i + b ) , σ sign function Frank Rosenblatt (1928–1971) 13 / 32
Recommend
More recommend