artificial intelligence artificial neural networks
play

ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html 2


  1. Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

  2. 2

  3. Outline  Biological neural networks  Artificial NN basics and training: – perceptrons – multi‐layer networks  Combination with other ML techniques – NN and Reinforcement Learning • e.g. AlphaGo – NN and Evolutionary Computing 3

  4. (Artificial) Neural Networks  Supervised learning technique: error‐driven classification  Output is weighted function of inputs  Training updates the weights  Used in games for e.g. – Select weapon – Select item to pick up – Steer a car on a circuit – Recognize characters – Recognize face – … 4

  5. Biological Neural Nets  Pigeons as art experts (Watanabe et al. 1995) – Experiment: • Pigeon in Skinner box • Present paintings of two different artists (e.g. Chagall / Van Gogh) • Reward for pecking when presented a particular artist (e.g. Van Gogh) 5

  6. 6

  7. Results from experiment Pigeons were able to discriminate between Van Gogh and Chagall:  with 95% accuracy, when presented with pictures they had been trained on  still 85% successful for previously unseen paintings of the artists 7

  8. Praise to neural nets  Pigeons have acquired knowledge about art – Pigeons do not simply memorise the pictures – They can extract and recognise patterns (the ‘style’) – They generalise from the already seen to make predictions  Pigeons have learned.  Can one implement this using an artificial neural network? 8

  9. Inspiration from biology  If a pigeon can do it, how hard can it be?  ANN’s are biologically inspired.  ANN’s are not duplicates of brains (and don’t try to be)! 9

  10. (Natural) Neurons Natural neurons:  receive signals through synapses (~ inputs)  If signals strong enough (~ above some threshold ), – the neuron is activated – and emits a signal though the axon . (~ output ) Artificial neuron (Node) Natural neuron 10

  11. McCulloch & Pitts model (1943) “A logical calculus of the ideas immanent in nervous activity” Linear x 1 w 1 hard Combiner output delimiter x 2 w 2 y aka: - linear threshold gate w n x n - threshold logic unit • n binary inputs x i and 1 binary output y • n weights w i ϵ {‐1,1} � • Linear combiner: z = ∑ 𝑥 � 𝑦 � ��� • Hard delimiter: unit step function at threshold θ , i.e. 𝑧 � 1 if 𝑨 � 𝜄, 𝑧 � 0 if 𝑨 � 𝜄 11

  12. Rosenblatt’s Perceptron (1958) x z y = g(z) x • enhanced version of McCulloch‐Pitts artificial neuron • n+ 1 real‐valued inputs : x 1 … x n and 1 bias b ; binary output y • weights w i with real‐valued values � • Linear combiner: z = ∑ 𝑥 � 𝑦 � � 𝑐 ��� • g(z) : (hard delimiter) unit step function at threshold 0, i.e. 𝑧 � 1 if 𝑨 � 0 , 𝑧 � 0 if 𝑨 � 0 12

  13. Classification: feedforward The algorithm for computing outputs from inputs in perceptron neurons is the feedforward algorithm. 4 w=2 8 -4 0 w=4 -3 -12 0 weighted input: activation g(z) : 0 � z = � � ��� 0 13

  14. Bias & threshold implementation Bias can be incorporated in three different ways, with same effect on output: ∑ ∑ b w 0 = 1 θ - b 1 b Alternatively: threshold θ can be incorporated in three different ways, with same effect on output… 14

  15. Single layer perceptron Input Single layer of nodes: neurons: • Rosenblatt’s perceptron is building w 13 y 1 3 x 1 1 block of single‐layer perceptron w 14 w 23 • which is the simplest feedforward y 2 x 2 2 4 w 24 neural network • alternative hard‐limiting activation functions g(z) possible; e.g. sign function: 𝑧 � �1 if 𝑨 � 0 , 𝑧 � �1 if 𝑨 � 0 • can have multiple independent outputs y i • the adjustable weights can be trained using training data • the Perceptron learning rule adjusts the weights w 1 …w n such that the inputs x 1 …x n give rise to the (desired) output(s) 15

  16. Perceptron learning: idea Idea: minimize error in the output  per output: 𝑓 � 𝑒 � 𝑧 ( d =desired output) �  If 𝑓 � 1 then z � ∑ 𝑥 � 𝑦 � should be increased such that it ��� exceeds the threshold �  If 𝑓 � �1 then z � ∑ 𝑥 � 𝑦 � should be decreased such ��� that it falls below the threshold ��  change 𝑥 � ← 𝑥 � +/‐ term proportional to gradient �� � � 𝑦 �  Proportional change: learning rate 𝛽 > 0 NB in the book the learning rate is called Gain, with notation η 16

  17. Perceptron learning Initialize weights and threshold (or bias) to random numbers; Choose a learning rate 0 � 𝛽 � 1 For each training input t =< x 1 ,…,x n >: 1 ‘epoch ’ calculate the output y(t) and error e(t)=d(t) - y(t) desired output Adjust all n weights using perceptron learning rule: 𝑥 � ← 𝑥 � � ∆𝑥 � where ∆𝑥 � � 𝛽 𝑦 � e(t) All Weights unchanged ? Weights for any t changed? Ready or other stopping rule… 17

  18. Example: AND- learning (1) x 1 x 2 d x 2 0 0 0 1 0 1 0 1 0 0 1 1 1 x 1 0 1 d esired output of logical AND, given 2 binary inputs 18

  19. Example AND (2) x 1 0 w=0.3 0 0 0 x 2 w=-0.1 0 0 e(t 1 ) = d(t) – 0 0.2 = 0 – 0 Init: choose weights w i and threshold θ randomly in [‐0.5,0.5]; set ; use step function : return 0 if < θ ; 1 if ≥ θ x 1 x 2 d(t) Alternative:  use bias b= – θ t 1 0 0 0 with unit stepfunction t 2 0 1 0 t 3 1 0 0  Done with t 1 , for now… t 4 1 1 1 19

  20. Example AND (3) x 1 0 w=0.3 0 -0.1 0 x 2 w=-0.1 1 -0.1 e(t 2 ) = 0-0 0.2 x 1 x 2 d(t)  t 1 0 0 0 t 2 0 1 0 t 3 1 0 0  Done with t 2 , for now… t 4 1 1 1 20

  21. Example AND (4) x 1 1 w=0.3 w=0.2 0.3 0.3 1 x 2 w=-0.1 0 0 e(t 3 ) = 0 - 1 0.2 � (t) x 1 x 2 d(t) � t 1 0 0 0  � t 2 0 1 0 t 3 1 0 0 � 1 1 1 t 4  w 1  0.2; done with t 3 , for now… 21

  22. Example AND (5) x 1 w=0.2 1 w=0.3 0.2 0.1 0 x 2 w=0 w=-0.1 1 -0.1 e(t 4 ) = 1-0 0.2 x 1 x 2 d(t) � (t) � t 1 0 0 0  � t 2 0 1 0 .1 t 3 1 0 0 � t 4 1 1 1  w 1  0.3 and w 2  0; done with t 4 and first epoch… 22

  23. Example (6) : 4 epoch’s later… x 1 w=0.1 x 2 w=0.1 0.2  algorithm has converged, i.e. the weights do not change any more.  algorithm has correctly learned the AND function 23

  24. AND example (7): results x 2 x 1 x 2 d y 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 1 1 x 1 0 1 Learned function/decision boundary: 0.1 𝑦 � � 0.1 𝑦 � � 0.2 linear classifier 𝑦 � � 2 � 𝑦 � Or : 24

  25. Perceptron learning: properties All linear functions  space without local optima Complete: yes, if – 𝛽 sufficiently small or initial weights sufficiently large – examples come from a linearly separable function! then perceptron learning converges to a solution. Optimal: no (weights serve to correctly separate ‘seen’ inputs; no guarantees for ‘unseen’ inputs close to the decision boundaries) 25

  26. Limitation of perceptron: example x 2 XOR x 1 x 2 d 1 0 0 0 0 1 1 1 0 1 1 1 0 x 1 0 1  Cannot separate two output types with a single linear function  XOR is not linearly separable. 26

  27. Solving XOR using 2 single layer perceptrons x 1 x 1 ϴ =1 ϴ =1 x 1 1 1 ϴ =1 3 -1 1 1 1 y 3 -1 1 y 5 -1 4 4 y 2 2 1 -1 2 1 1 ϴ =1 x 2 x 2 ϴ =1 x 2 x 2 x 2 x 2 1 1 1 x 1 x 1 x 1 0 0 0 1 1 1 27

  28. Types of decision regions 28

  29. Multi-layer networks x 1 y 1 x 2 y 2 x 3 y 3 input hidden output nodes layer of neuron neurons layer • This type of network is also called a feed forward network • hidden layer captures nonlinearities • more than 1 hidden layer is possible, but often reducible to 1 hidden layer • introduced in 50s, but not studied until 80s 29

  30. Multi-Layer Networks In MLNs  outputs not based on simple weighted sum of inputs  weights are shared  dependent outputs Input signals x 1 y 1 x 2 y 2 x 3 y 3 Error signals  errors must be distributed over hidden neurons  continuous activation functions are used 30

  31. Continuous activation functions As continuous activation function, we can use • a (piecewise) linear function   (ReLU) • a sigmoid (smoothed version of step function)  e.g. logistic sigmoid g(z) �� � z 31

  32. Continuous artificial neurons Linear x 1 w 1 sigmoid Combiner output function x 2 w 2 y w n x n weighted input: activation ( logistic sigmoid ): z = 32

  33. Example w=2 3 6 -2 0.119 w=4 -2 -8 weighted input: activation: z = 33

  34. Error minimization in MLNs: idea Idea: minimize error in output through gradient descent  Total error is sum of squared error, per output: 𝐹 � ∑ � � 𝑒 � 𝑧 � � ( d =desired output) � ��  change 𝑥 � ← 𝑥 � � term proportional to gradient �� � 34

Recommend


More recommend