multi layer networks
play

Multi-Layer Networks M. Soleymani Deep Learning Sharif University - PowerPoint PPT Presentation

Multi-Layer Networks M. Soleymani Deep Learning Sharif University of Technology Spring 2019 Most slides have been adapted from: Bhiksha Raj, 11-785, CMU 2019 and Fei Fei Li lectures, cs231n, Stanford 2017 and some from Hinton, NN for


  1. Perceptron Algorithm 𝜹 𝜹 𝑺 𝛿 is the best-case margin R is the length of the longest vector 37

  2. Adjusting weights 𝒙 _`& = 𝒙 _ βˆ’ πœƒπ›ΌπΉ c 𝒙 _ β€’ Weight update for a training pair (π’š c , 𝑧 (c) ) : – Perceptron : If π‘‘π‘—π‘•π‘œ(𝒙 d π’š (c) ) β‰  𝑧 (c) then βˆ†π’™ = π’š (c) 𝑧 (c) else βˆ†π’™ = 𝟏 – ADALINE : βˆ†π’™ = πœƒ(𝑧 (c) βˆ’ 𝒙 d π’š (c) )π’š (c) 𝐹 c 𝒙 = 𝑧 (c) βˆ’ 𝒙 d π’š (c) ' β€’ Widrow-Hoff, LMS, or delta rule 38

  3. How to learn the weights: multi class example 40

  4. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 41

  5. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 42

  6. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 43

  7. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 44

  8. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 45

  9. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 46

  10. Single layer networks as template matching β€’ Weights for each class as a template (or sometimes also called a prototype) for that class. – The winner is the most similar template. β€’ The ways in which hand-written digits vary are much too complicated to be captured by simple template matches of whole shapes. β€’ To capture all the allowable variations of a digit we need to learn the features that it is composed of. 47

  11. The history of perceptrons β€’ They were popularised by Frank Rosenblatt in the early 1960’s. – They appeared to have a very powerful learning algorithm. – Lots of grand claims were made for what they could learn to do. β€’ In 1969, Minsky and Papert published a book called β€œPerceptrons” that analyzed what they could do and showed their limitations. – Many people thought these limitations applied to all neural network models. 48

  12. What binary threshold neurons cannot do β€’ A binary threshold output unit cannot even tell if two single bit features are the same! β€’ A geometric view of what binary threshold neurons cannot do β€’ The positive and negative cases cannot be separated by a plane 49

  13. What binary threshold neurons cannot do β€’ Positive cases (same): (1,1)->1; (0,0)->1 β€’ Negative cases (different): (1,0)->0; (0,1)->0 β€’ The four input-output pairs give four inequalities that are impossible to satisfy: – w 1 + w 2 β‰₯ΞΈ – 0 β‰₯ΞΈ – w 1 <ΞΈ – w 2 <ΞΈ 50

  14. Discriminating simple patterns under translation with wrap-around β€’ Suppose we just use pixels as the features. β€’ binary decision unit cannot discriminate patterns with the same number of on pixels – if the patterns can translate with wrap- around! 51

  15. Sketch of a proof β€’ For pattern A, use training cases in all possible translations. – Each pixel will be activated by 4 different translations of pattern A. – So the total input received by the decision unit over all these patterns will be four times the sum of all the weights. β€’ For pattern B, use training cases in all possible translations. – Each pixel will be activated by 4 different translations of pattern B. – So the total input received by the decision unit over all these patterns will be four times the sum of all the weights. β€’ But to discriminate correctly, every single case of pattern A must provide more input to the decision unit than every single case of pattern B. β€’ This is impossible if the sums over cases are the same. 52

  16. Networks with hidden units β€’ Networks without hidden units are very limited in the input-output mappings they can learn to model. – More layers of linear units do not help. Its still linear. – Fixed output non-linearities are not enough. β€’ We need multiple layers of adaptive, non-linear hidden units. But how can we train such nets? 53

  17. The multi-layer perceptron β€’ A network of perceptrons – Generally β€œlayered ” 54

  18. Feed-forward neural networks β€’ Also called Multi-Layer Perceptron (MLP) 55

  19. MLP with single hidden layer β€’ Two-layer MLP (Number of layers of adaptive weights is counted) ‰ ‰ ( ['] 𝑨 ['] 𝜚 * π‘₯ ,Λ† [&] 𝑦 , 𝑝 † π’š = πœ” * π‘₯ β‡’ 𝑝 † π’š = πœ” * π‘₯ Λ† ˆ† ˆ† Λ†qΕ  Λ†qΕ  ,qΕ  𝑨 Ε  = 1 𝑨 Λ† ['] [&] π‘₯ π‘₯ ,Λ† ˆ† 𝑨 & 𝑦 Ε  = 1 𝜚 πœ” 𝑝 & 𝑦 & … … 𝜚 … πœ” 𝑝 β€’ 𝑦 ( 𝜚 𝑨 ‰ Output Input 𝑗 = 0, … , 𝑒 π‘˜ = 1 … 𝑁 π‘˜ = 1 … 𝑁 𝑙 = 1, … , 𝐿 56

  20. Beyond linear models π’ˆ = π‘Ώπ’š π’ˆ = 𝑿 ' 𝜚 𝑿 𝟐 π’š 57

  21. Beyond linear models π’ˆ = π‘Ώπ’š π’ˆ = 𝑿 ' 𝜚 𝑿 𝟐 π’š π’ˆ = 𝑿 0 𝜚 𝑿 ' 𝜚 𝑿 𝟐 π’š 58

  22. Defining β€œdepth” β€’ What is a β€œdeep” network 60

  23. Deep Structures β€’ In any directed network of computational elements with input source nodes and output sink nodes, β€œdepth” is the length of the longest path from a source to a sink β€’ Left: Depth =2. Right: Depth =3 β€’ β€œ Deep ” [ Depth > 2 61

  24. The multi-layer perceptron N.Net β€’ Inputs are real or Boolean stimuli β€’ Outputs are real or Boolean value s – Can have multiple outputs for a single input β€’ What can this network compute? – What kinds of input/output relationships can it model? 63

  25. MLPs approximate functions 2 β„Ž 2 1 1 0 1 β„Ž n 1 -1 1 1 x 2 2 1 2 1 1 1 -1 -1 1 1 1 1 X Y Z A β€’ MLP s can compose Boolean functions β€’ MLPs can compose real-valued functions β€’ What are the limitations? 64

  26. Multi-layer Perceptrons as universal Boolean functions 65

  27. The perceptron as a Boolean gate X 1 -1 2 X 0 1 X 1 Y 1 1 Y β€’ A perceptron can model any simple binary Boolean gate 67

  28. Perceptron as aBoolean gate 1 1 1 L -1 -1 -1 Will fire only if X 1 .. X L are all 1 and X L+1 .. X N are all 0 β€’ The universal AND gate – AND any number of inputs β€’ Any subset of who may be negated 68

  29. Perceptron as aBoolean gate 1 1 1 L-N+1 -1 -1 -1 Will fire only if any of X 1 .. X L are 1 or any of X L+1 .. X N are 0 β€’ The universal OR gate – OR any number of inputs β€’ Any subset of who may be negated 69

  30. Perceptron as aBoolean gate 1 1 1 Will fire only if at least K inputs are 1 K 1 1 1 β€’ Generalized majority gate – Fire if at least K inputs are of the desired polarity 70

  31. Perceptron as aBoolean gate 1 1 Will fire only if the total number of of X 1 .. 1 X L that are 1 or X L+1 .. X N that are 0 is at L -N+K -1 least K -1 -1 β€’ Generalized majority gate – Fire if at least K inputs are of the desired polarity 71

  32. The perceptron is not enough X ? ? ? Y β€’ Cannot compute an XOR 72

  33. Multi-layer perceptron XOR X 1 1 1 1 2 1 -1 -1 -1 Y Hidden Layer β€’ An XOR takes three perceptrons 73

  34. Multi-layer perceptron XOR X 1 1 -2 1.5 0.5 1 1 Y β€’ With 2 neurons – 5 weights and two thresholds 74

  35. Multi-layer perceptron 2 1 1 0 1 1 -1 1 1 2 2 1 1 1 1 1 -1 -1 1 1 1 1 X Y Z A β€’ MLPs can compute more complex Boolean functions β€’ MLPs can compute any Boolean function – Since they can emulate individual gates β€’ MLPs are universal Boolean functions 75

  36. MLP as Boolean Functions 2 1 1 0 1 1 -1 1 1 2 1 1 2 -1 -1 1 1 1 1 1 1 1 X Y Z A β€’ MLPs are universal Boolean functions – Any function over any number of inputs and any number of outputs β€’ But how many β€œlayers” will they need? 76

  37. How many layers for aBoolean MLP? Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 Truth table shows all input 0 1 1 0 0 1 combinations for which output is 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 β€’ A Boolean function is just a truth table 77

  38. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 β€’ Expressed in disjunctive normal form 78

  39. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 79

  40. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 80

  41. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 81

  42. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 82

  43. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 83

  44. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 84

  45. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 0 1 1 0 1 0 1 0 1 1 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 85

  46. How many layers for aBoolean MLP? ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ Truth Table X X X X X Y ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Any truth table can be expressed in this manner! β€’ A one-hidden-layer MLP is a Universal Boolean Function β€’ But what is the largest number of perceptrons required in the single hidden layer for an N-input-variable function? 86

  47. Worst case β€’ Which truth tables cannot be reduced further simply? β€’ Largest width needed for a single-layer Boolean network on N inputs – Worst case: 2 u˜& β€’ Example: Parity function π‘Œ, 𝑍 00 01 11 10 𝑋, π‘Ž 1 0 1 0 00 0 1 0 1 01 1 0 1 0 11 0 1 0 1 10 π‘Œ βŠ• 𝑍 βŠ• π‘Ž βŠ• 𝑋 87

  48. Boolean functions β€’ Input: N Boolean variable β€’ How many neurons in a one hidden layer MLP is required? β€’ More compact representation of a Boolean function – β€œKarnaugh Map” β€’ representing the truth table as a grid β€’ Grouping adjacent boxes to reduce the complexity of the Disjunctive Normal Form (DNF) formula π‘Œ, 𝑍 00 01 10 11 𝑋, π‘Ž 1 1 1 1 00 01 1 1 10 11 1 1 88

  49. How many neurons in the hidden layer? ”𝑍 ”𝑋 Ε“ π‘ŽΜ… ∨ π‘Œ ”𝑍𝑋 Ε“ π‘ŽΜ… ∨ π‘Œπ‘ ”𝑋 Ε“ π‘ŽΜ… ∨ π‘Œπ‘π‘‹ Ε“ π‘ŽΜ… ∨ π‘Œ ”𝑍 β€π‘‹π‘Ž ∨ π‘Œπ‘ β€π‘‹π‘ŽΜ… ∨ π‘Œπ‘π‘‹π‘ŽΜ… ∨ β€’ π‘Œ β€π‘‹π‘Ž π‘Œπ‘ π‘Œ, 𝑍 00 01 11 10 𝑋, π‘Ž 1 1 1 1 00 01 11 1 1 1 1 10 Ε“ π‘ŽΜ… ∨ 𝑍 β€π‘‹π‘Ž ∨ π‘Œπ‘‹π‘ŽΜ… β€’ 𝑋 89

  50. Width of a deepMLP Y Z WX 00 01 11 10 Y Z WX 00 00 01 01 11 11 10 11 10 01 10 00 Y Z 00 01 11 10 UV 92

  51. Using deep network: Parity function on N inputs β€’ Simple MLP with one hidden layer: 2 u˜& Hidden units 𝑂 + 2 2 u˜& + 1 Weights and biases 93

  52. Using deep network: Parity function on N inputs β€’ Simple MLP with one hidden layer: 2 u˜& Hidden units 𝑂 + 2 2 u˜& + 1 Weights and biases π‘Œ – β€’ 𝑔 = π‘Œ & βŠ• π‘Œ ' βŠ• β‹― βŠ• π‘Œ u 3(𝑂 βˆ’ 1) Nodes π‘Œ 0 9(𝑂 βˆ’ 1) Weights and biases The actual number of parameters in a network is the number that really matters in software or hardware implementations π‘Œ & π‘Œ ' 94

  53. A better architecture β€’ Only requires 2log𝑂 layers β€’ 𝑔 = π‘Œ & βŠ• π‘Œ ' βŠ• π‘Œ 0 βŠ• π‘Œ – βŠ• π‘Œ – βŠ• π‘Œ β€” βŠ• π‘Œ Β’ βŠ• π‘Œ Β£ π‘Œ & π‘Œ ' π‘Œ β€” π‘Œ Β’ π‘Œ 0 π‘Œ – π‘Œ Β£ π‘Œ Β€ 95

  54. The challenge of depth … … π‘Ž π‘Ž & ‰ 𝑦 & 𝑦 u β€’ Using only K hidden layers will require 𝑃 2 Β₯u neurons in the kth layer, where 𝐷 = 2 Λœβ€ /' – Because the output can be shown to be the XOR of all the outputs of k-1th hidden layer – i.e. reducing the number of layers below the minimum will result in an exponentially sized network to express the function fully – A network with fewer than the minimum required number of neurons cannot model the function 96

  55. Caveat 1: Not all Booleanfunctions.. β€’ Not all Boolean circuits have such clear depth-vs-size tradeoff β€’ Shannon’s theorem: For 𝑂 > 2 , there is Boolean function of 𝑂 variables that requires at least 2 u /𝑂 gates – More correctly, for large N, almost all N-input Boolean function need more than 2 u /𝑂 gates β€’ Regardless of depth β€’ Note: if all Boolean functions over 𝑂 inputs could be computed using a circuit of size that is polynomial in 𝑂 , P=NP ! 99

  56. Caveat 2 β€’ Used a simple β€œBoolean circuit” analogy for explanation β€’ We actually have threshold circuit (TC) not, just a Boolean circuit (AC) – Specifically composed of threshold gates β€’ More versatile than Boolean gates (can compute majority function) β€’ E.g. β€œat least K inputs are 1” is a single TC gate, but an exponential size AC β€’ For fixed depth, πΆπ‘π‘π‘šπ‘“π‘π‘œ 𝑑𝑗𝑠𝑑𝑣𝑗𝑒𝑑 βŠ‚ π‘’β„Žπ‘ π‘“π‘‘β„Žπ‘π‘šπ‘’ 𝑑𝑗𝑠𝑑𝑣𝑗𝑒𝑑 (𝑑𝑒𝑠𝑗𝑑𝑒 𝑑𝑣𝑐𝑑𝑓𝑒) – A depth-2 TC parity circuit can be composed with 𝑃(π‘œ ' ) weights β€’ But a network of depth log (π‘œ) requires only 𝑃(π‘œ) weights β€’ Other formal analyses typically view neural networks as arithmetic circuits – Circuits which compute polynomials over any field β€’ So lets consider functions over the field of reals 100

  57. Summary: Wide vs. deep network β€’ MLP with a single hidden layer is a universal Boolean function β€’ However, a single-layer network might need an exponential number of hidden units w.r.t. the number of inputs β€’ Deeper networks may require far fewer neurons than shallower networks to express the same function – Could be exponentially smaller β€’ Optimal width and depth depend on the number of variables and the complexity of the Boolean function – Complexity: minimal number of terms in DNF formula to represent it 101

  58. MLPs as universal classifiers 102

  59. The MLPas a classifier 2 784 dimensions (MNIST) 784 dimensions β€’ MLP as a function over real inputs β€’ MLP as a function that finds a complex β€œdecision boundary” over a space of reals 103

  60. A Perceptron onReals x 1 x 2 x 3 x N 1 * π‘₯ , 𝑦 , β‰₯ π‘ˆ , π‘₯ & 𝑦 & + π‘₯ ' 𝑦 ' = π‘ˆ x 2 0 x 2 x 1 x 1 β€’ A perceptron operates on real-valued vectors – This is a linear classifier 104

  61. Boolean functions with areal perceptron 1,1 1,1 1,1 0,1 0,1 0,1 X X Y Y Y X 0,0 1,0 0,0 1,0 0,0 1,0 β€’ Boolean perceptrons are also linear classifiers – Purple regions are 1 105

  62. Composing complicated β€œdecision” boundaries Can now be composed into β€œnetworks” to x 2 compute arbitrary classification β€œboundaries” x 1 β€’ Build a network of units with a single output that fires if the input is in the coloured area 106

  63. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 107

  64. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 108

  65. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 109

  66. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 110

Recommend


More recommend