Neural Networks Stefan Edelkamp
1 Overview - Introduction - Percepton - Hofield-Nets - Self-Organizing Maps - Feed-Forward Neural Networks - Backpropagation Overview 1
2 Introduction Idea: Mimic principle of biological neural networks with artificial neural networks 6 5 9 8 1 3 2 7 4 - adapt settled solutions of nature - parallelization ⇒ high performance - redundancy ⇒ tolerance for failures - enable learning with small programming efforts Introduction 2
Ingrediences Needs for an artificial neural network: • behavior artificial neurons • order of computation • activation function • structure of the net (topology) • recurrent nets • feed-forward nets • integration in environment • learning algorithm Introduction 3
Percepton-Learning . . . very simple network with no hidden neurons Inputs: x , weighted with w , weights added Activating Function: Θ Output: z , determined by computing Θ( w T x ) Additional: weighted input representing constant 1 Introduction 4
Training R d → { 0 , 1 } net function f : M ⊂ I 1. initialize counter i and initial weight vector w 0 to 0 2. as long as there are vectors for which w i x ≤ 0 set w i +1 to w i + x and increase i by 1 3. return w i +1 Introduction 5
Termination on Training Data Assume vector w to be normalized, and w ∗ to be final with || w ∗ || = 1 - f = Θ(( x , 1) w ∗ ) , constants δ and γ with | ( x , 1) w ∗ | ≥ δ and || ( x , 1) || ≤ γ - for angle between w i and w ∗ we have 1 ≥ cos α i = w T i w ∗ / || w i || i +1 w ∗ = ( w i + x i ) T w ∗ = w T i w ∗ + x T - w T i w ∗ 0 w ∗ ≥ δ ⇒ w T i +1 w ∗ ≥ δ ( i + 1) - x T � � || w i || 2 + || x i || 2 + 2 w T ( w i + x i ) T ( w i + x i ) ≤ - || w i +1 || = i x i ≤ √ || w i || 2 + γ 2 ≤ γ √ i + 1 (Induction: || w i || ≤ γ � i ) ⇒ cos α i ≥ δ √ i + 1 /γ converges to ∞ (as i goes to ∞ ) Introduction 6
3 Hopfield Nets 1 2 Neurons: . . . d ; x i ∈ { 0 , 1 } Activations: x 1 x 2 . . . x d Connections: w ij ∈ I R (1 ≤ i, j ≤ d ) with � � w ii = 0 , w ij = w ji ⇒ W := w ij d × d Update: asynchronous & stochastic if � d 0 i =1 x i w ij < 0 x ′ j := if � d 1 i =1 x i w ij > 0 x j otherwise Hopfield Nets 7
Example 1 x 1 x 2 0 1 − 2 W = 1 0 3 − 2 3 − 2 3 0 x 3 Use: • associative memory • computing Boolean functions • combinatorical optimization Hopfield Nets 8
Energy of a Hopfield-Net x = ( x 1 , x 2 , . . . , x d ) T ⇒ E ( x ) := − 1 2 x T W x = − � i<j x i w ij x j be the energy of a Hopfield net Theorem Every update, which changes the Hopfield-Netz, reduces the energy. Proof Assume: Update changes x k into x ′ k ⇒ � � E ( x ) − E ( x ′ ) x ′ i w ij x ′ = − x i w ij x j + j i<j i<j � � x ′ k w kj x ′ = x k w kj x j + − j ���� j � = k j � = k = x j � � � x k + ( − x ′ = − k ) w kj x j > 0 j � = k Hopfield Nets 9
Solving a COP Input: Combinatorial Optimization Problem (COP) Output: Solution for COP Algorithm: • select Hopfield net with parameters of COP as weights and solution at minimum of energy • start net with random activation • computer sequence of updates until stabilization • read parameters • test feasibility and optimality of solution Hopfield Nets 10
Multi Flop-Problem Problem Instance: k, n ∈ I N , k < n x = ( x 1 , . . . , x n ) ∈ { 0 , 1 } n Feasible Solutions: ˜ x ) = � n Objective Function: P (˜ i =1 x i Optimal Solution: solution ˜ x with P (˜ x ) = k Minimization Problem: d = n + 1 , x d = 1 , x = ( x 1 , x 2 , . . . , x n , x d ) T ⇒ � 2 �� d E ( x ) = i =1 x i − ( k + 1) d d � � � x 2 x i + ( k + 1) 2 = + x i x j − 2( k + 1) i ���� i =1 i � = j i =1 = x i d − 1 � � x i x d + k 2 = x i x j − (2 k + 1) i =1 i � = j − 1 x i ( − 4) x j − 1 � � x i (4 k + 2) x d + k 2 = 2 2 i<j i<d Hopfield Nets 11
Example ( n = 3 , k = 1) : x 1 1 − 2 1 x 4 x 2 − 2 − 2 1 x 3 Hopfield Nets 12
Traveling Salesperson-Problem (TSP) Problem Instance: Cities: 1 2 . . . n R + d ij ∈ I (1 ≤ i, j ≤ n ) with d ii = 0 Distances: Feasible Solution: permutation π of (1 , 2 , . . . , n ) Objective Functions: P ( π ) = � n i =1 d π ( i ) ,π ( i mod n +1) Optimal Solutions: feasible solution π with minimal P ( π ) Hopfield Nets 13
Encoding Idea: Hopfield-Net with d = n 2 + 1 neurons: + i − − d 12 − d 21 π ( i ) − d 32 − d 23 Problem: ”‘Size”’ of the weights to allow both feasible and good solutions Trick: Transition to continuous Hopfield-Net and modified weights ⇒ good solution of TSP Hopfield Nets 14
4 Self-Organizing Maps (SOM) Neurons: Input: 1 , 2 , . . . , d for components x i Map: 1 , 2 , . . . , m ; regular (linear-, rectangular, hexagonial-) Grid r to store R d pattern vectors µ i ∈ I Output: 1 , 2 , . . . , d for µ c Update: R d , learning set; at time t ∈ I N + , x ∈ L is chosen by random ⇒ L ⊂ I c ∈ { 1 , . . . , m } determined with � x − µ c � ≤ � x − µ i � ( ∀ i ∈ { 1 , . . . , m } ) and adapted to the pattern: µ ′ i := µ i + h ( c, i, t ) ( x − µ i ) ∀ i ∈ { 1 , . . . , m } with h ( c, i, t ) time-dependent neighborhood-relation � � −� r c − r i � 2 and h ( c, i, t ) → 0 for t → ∞ , e.g.: h ( c, i, t ) = α ( t ) · exp 2 σ ( t ) 2 Self-Organizing Maps (SOM) 15
Application of SOM . . . include: visualization and interpretation, dimension reduction schemes, clustering, and classification, COPs . . . Self-Organizing Maps (SOM) 16
A size- 50 map adapts to a triangle Self-Organizing Maps (SOM) 17
A 15 × 15 -Grid is adapted to a triangle Self-Organizing Maps (SOM) 18
SOM for Combinatorial Optimization ∆ -TSP Idea: Use growing ring (elastic band) of neurons Tests with n ≤ 2392 show that the running time scales linearly and deviates from the optimum by less than 9 % Self-Organizing Maps (SOM) 19
SOM for Combinatorial Optimization Self-Organizing Maps (SOM) 20
10 neurons 50 neurons 500 neurons 2000 neurons
SOM for Combinatorial Optimization Tour with 2526 neurons: Self-Organizing Maps (SOM) 21
5 Layered Feed-Forward Nets (MLP) 1 2 3 Layered Feed-Forward Nets (MLP) 22
Formalization A L -layered MLP (multi-layered perceptron) Layer: S 0 , S 1 , . . . , S L − 1 , S L Connection: Of each neuron i in S ℓ to j in S ℓ +1 with weight w ij , exept 1-neurons Update: layer-wise synchronously mixed �� � x ′ j := ϕ i ∈V ( j ) x i w ij with ϕ differenciable, 1 ϕ ( a ) = σ ( a ) = z.B. 1+exp( − a ) -5 5 Layered Feed-Forward Nets (MLP) 23
Layered Feed-Forward Nets Applications: function approximation, classification Theorem: All Boolean functions can be computed with a 2-layered MLP (no proof) Theorem: continuous real functions and their derivatives can be jointly approximated to an arbitrary precision on a compact sets (no proof) Layered Feed-Forward Nets (MLP) 24
Learning Parameters in MLP Given: x 1 , . . . , x N ∈ I R d und t 1 , . . . , t N ∈ I R c , MLP with d input and c output neurons, w = w 1 , . . . , w M contains all weights, f ( x , w ) is the net function find optimal w ∗ , that minimizes the error Task: N c E ( w ) := 1 � � � 2 � f k ( x n , w ) − t n k 2 n =1 k =1 partial derivatives of f exists with respect to the inputs and the parameters ⇒ any gradient-based optimization methods can be used (conjugated gradient, . . . ) N c � ∇ w f k ( x n , w ) � � � f k ( x n , w ) − t n ∇ w E ( w ) = k n =1 k =1 Layered Feed-Forward Nets (MLP) 25
Backpropagation Basic Calculus: � � � � � � � ∂ ∂ ∂ � � � ∂t f ( g ( t )) = ∂s f ( s ) ∂t g ( t ) � � � � t = t 0 � s = g ( t 0 ) � t = t 0 Example: ϕ ( a ) := 9 − a 2 , x = (1 , 2) T , w = (1 , 1) T , t = 2 : x 1 ∗ f E ϕ + . 2 − w 1 2 x 2 ∗ t w 2 Layered Feed-Forward Nets (MLP) 26
∇ w E ( w ) | w =(1 , 1) T = h ( x, y ) = x ∗ y ⇒ ∂/∂x h ( x, y ) = y h ( x, y ) = x + y ⇒ ∂/∂x h ( x, y ) = 1 h ( x, y ) = x − y ⇒ ∂/∂x h ( x, y ) = 1 = 9 − x 2 ⇒ ∂/∂x ϕ ( x ) ϕ ( x ) = − 2 x = x 2 / 2 h ( x ) ⇒ ∂/∂x h ( x ) = x
Backpropagation Theorem: ∇ w E ( w ) can be computed in time O ( N × M ) if network is of size O ( M ) Algorithm: ∀ n ∈ { 1 , . . . , N } • compute net functions f ( x n , w ) and associated error E in forward direction store values in net • compute partial derivatives of E with respect to all intermediates in backward direction and add all parts for total gradient Layered Feed-Forward Nets (MLP) 27
Recommend
More recommend