neural networks what can a network represent
play

Neural Networks: What can a network represent Deep Learning, Spring - PowerPoint PPT Presentation

Neural Networks: What can a network represent Deep Learning, Spring 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain In their basic form, NNets


  1. How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Any truth table can be expressed in this manner! • A one-hidden-layer MLP is a Universal Boolean Function But what is the largest number of perceptrons required in the single hidden layer for an N-input-variable function? 37

  2. Reducing a Boolean Function YZ WX 00 01 11 10 This is a “Karnaugh Map” 00 It represents a truth table as a grid Filled boxes represent input combinations 01 for which output is 1; blank boxes have output 0 11 Adjacent boxes can be “grouped” to reduce the complexity of the DNF formula 10 for the table • DNF form: – Find groups – Express as reduced DNF 38

  3. Reducing a Boolean Function YZ WX 00 01 11 10 00 Basic DNF formula will require 7 terms 01 11 10 39

  4. Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 • Reduced DNF form: – Find groups – Express as reduced DNF 40

  5. Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 W X Y Z • Reduced DNF form: – Find groups – Express as reduced DNF 41

  6. Largest irreducible DNF? YZ WX 00 01 11 10 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 42

  7. Largest irreducible DNF? YZ WX 00 01 11 10 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 43

  8. Largest irreducible DNF? YZ How many neurons WX 00 01 11 10 in a DNF (one- 00 hidden-layer) MLP 01 for this Boolean 11 function? 10 • What arrangement of ones and zeros simply cannot be reduced further? 44

  9. Width of a single-layer Boolean network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function of 6 variables? 45

  10. The actual number of parameters in a network X 2 X 3 X 4 X 5 X 1 • The actual number of parameters in a network is the number of connections – In this example there are 30 • This is the number that really matters in software or hardware implementations 46

  11. Width of a single-layer Boolean network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden-layer) MLP for this Boolean function of 6 variables? – How many weights will this network require? 47

  12. Width of a single-layer Boolean network YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 00 YZ Will require O(N2 N-1 ) weights 00 01 11 10 UV superexponential in N • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function 48

  13. Width of a single-layer Boolean network YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 How many units if we use multiple layers? 00 YZ 00 01 11 10 UV How many weights? • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function 49

  14. Width of a deep network YZ WX 00 01 11 10 YZ WX 00 00 01 01 11 11 10 11 10 01 10 00 YZ 00 01 11 10 UV 50

  15. Multi-layer perceptron XOR X 1 1 1 -1 2 1 1 -1 -1 Y Hidden Layer • An XOR takes three perceptrons – 6 weights and three threshold values • 9 total parameters 51

  16. Width of a deep network YZ WX 00 01 11 10 00 01 9 perceptrons 11 10 W X Y Z • An XOR needs 3 perceptrons • This network will require 3x3 = 9 perceptrons – 27 parameters 52

  17. Width of a deep network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV 15 perceptrons U V W X Y Z • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons – 45 parameters 53

  18. Width of a deep network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV More generally, the XOR of N variables will require 3(N-1) U V W X Y Z perceptrons (and 9(N-1) weights) • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons – 45 weights 54

  19. Width of a single-layer Boolean network YZ WX 00 Single hidden layer: Will require 2 N-1 +1 perceptrons in all (including output unit) 01 Exponential in N 11 10 11 10 01 Will require 3(N-1) perceptrons in a deep 00 YZ 00 01 11 10 UV network (with 9(N-1) parameters) • How many neurons in a DNF (one-hidden- Linear in N!!! layer) MLP for this Boolean function Can be arranged in only 2log 2 (N) layers 55

  20. A better representation 𝑌 � 𝑌 � • Only layers – By pairing terms – 2 layers per XOR … 56

  21. The challenge of depth …… 𝑎 � 𝑎 � 𝑌 � 𝑌 � • Using only K hidden layers will require O(2 CN ) neurons in the Kth layer, where ��/� – Because the output can be shown to be the XOR of all the outputs of the K-1th hidden layer – I.e. reducing the number of layers below the minimum will result in an exponentially sized network to express the function fully – A network with fewer than the minimum required number of neurons cannot model 57 the function

  22. Recap: The need for depth • Deep Boolean MLPs that scale linearly with the number of inputs … • … can become exponentially large if recast using only one layer • It gets worse.. 58

  23. The need for depth a b c d e f X 2 X 3 X 4 X 5 X 1 • The wide function can happen at any layer • Having a few extra layers can greatly reduce network size 59

  24. Depth vs Size in Boolean Circuits • The XOR is really a parity problem • Any Boolean circuit of depth using AND,OR and NOT gates with unbounded fan-in must have size – Parity, Circuits, and the Polynomial-Time Hierarchy, M. Furst, J. B. Saxe, and M. Sipser, Mathematical Systems Theory 1984 – Alternately stated: • Set of constant-depth polynomial size circuits of unbounded fan-in elements 60

  25. Caveat 1: Not all Boolean functions.. • Not all Boolean circuits have such clear depth-vs-size tradeoff • Shannon’s theorem: For , there is Boolean function of variables that requires at least gates – More correctly, for large ,almost all n -input Boolean functions need more than � gates • Note: If all Boolean functions over inputs could be computed using a circuit of size that is polynomial in , P = NP! 61

  26. Network size: summary • An MLP is a universal Boolean function • But can represent a given function only if – It is sufficiently wide – It is sufficiently deep – Depth can be traded off for (sometimes) exponential growth of the width of the network • Optimal width and depth depend on the number of variables and the complexity of the Boolean function – Complexity: minimal number of terms in DNF formula to represent it 62

  27. Story so far • Multi-layer perceptrons are Universal Boolean Machines • Even a network with a single hidden layer is a universal Boolean machine – But a single-layer network may require an exponentially large number of perceptrons • Deeper networks may require far fewer neurons than shallower networks to express the same function – Could be exponentially smaller 63

  28. Caveat 2 • Used a simple “Boolean circuit” analogy for explanation • We actually have threshold circuit (TC) not, just a Boolean circuit (AC) – Specifically composed of threshold gates • More versatile than Boolean gates – E.g. “at least K inputs are 1” is a single TC gate, but an exponential size AC – For fixed depth, 𝐶𝑝𝑝𝑚𝑓𝑏𝑜 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 ⊂ 𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 (strict subset) � weights – A depth-2 TC parity circuit can be composed with • But a network of depth log (𝑜) requires only 𝒫 𝑜 weights – But more generally, for large , for most Boolean functions, a threshold circuit that is polynomial in at optimal depth becomes exponentially large at • Other formal analyses typically view neural networks as arithmetic circuits – Circuits which compute polynomials over any field • So lets consider functions over the field of reals 64

  29. Today • Multi-layer Perceptrons as universal Boolean functions – The need for depth • MLPs as universal classifiers – The need for depth • MLPs as universal approximators • A discussion of optimal depth and width • Brief segue: RBF networks 65

  30. The MLP as a classifier 2 784 dimensions (MNIST) 784 dimensions • MLP as a function over real inputs • MLP as a function that finds a complex “decision boundary” over a space of reals 66

  31. A Perceptron on Reals x 1 1 x 2 x 3 w 1 x 1 +w 2 x 2 =T x 2 0 x N x 1 � � � x 2 • A perceptron operates on x 1 real- valued vectors – This is a linear classifier 67

  32. Boolean functions with a real perceptron 1,1 1,1 1,1 0,1 0,1 0,1 X X Y Y Y X 0,0 1,0 0,0 1,0 0,0 1,0 • Boolean perceptrons are also linear classifiers – Purple regions are 1 68

  33. Composing complicated “decision” boundaries Can now be composed into x 2 “networks” to compute arbitrary classification “boundaries” x 1 • Build a network of units with a single output that fires if the input is in the coloured area 69

  34. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 70

  35. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 71

  36. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 72

  37. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 73

  38. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 74

  39. Booleans over the reals 3 � � x 2 x 2 4 ��� 4 AND 3 3 5 y 1 y 2 y 3 y 4 y 5 x 1 x 1 4 4 3 3 4 x 2 x 1 • The network must fire if the input is in the coloured area 75

  40. More complex decision boundaries OR AND AND x 2 x 1 x 2 x 1 • Network to fire if the input is in the yellow area – “OR” two polygons – A third layer is required 76

  41. Complex decision boundaries • Can compose arbitrarily complex decision boundaries 77

  42. Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries 78

  43. Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries – With only one hidden layer! – How ? 79

  44. Exercise: compose this with one hidden layer x 2 x 1 x 2 x 1 • How would you compose the decision boundary to the left with only one hidden layer? 80

  45. Composing a Square decision boundary 2 2 2 4 2 � � y � ≥ 4? ��� • The polygon net y 1 y 2 y 3 y 4 x 2 x 1 81

  46. Composing a pentagon 2 2 3 4 4 3 3 5 4 4 2 4 2 3 3 � � y � ≥ 5? ��� 2 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 82

  47. Composing a hexagon 3 4 3 3 5 5 5 6 4 4 5 5 5 3 3 4 4 3 � � y � ≥ 6? ��� y 6 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 83

  48. How about a heptagon • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 84

  49. 16 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 85

  50. 64 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 86

  51. 1000 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 87

  52. Polygon net � � y � ≥ 𝑂? ��� y 1 y 2 y 3 y 4 y 5 x 2 x 1 • Increasing the number of sides reduces the area outside the polygon that have N/2 < Sum < N 88

  53. In the limit � � y � ≥ 𝑂? ��� y 1 y 2 y 3 y 4 y 5 x 2 x 1 N N/2 � ������ • � � � 𝐲������� • For small radius, it’s a near perfect cylinder – N in the cylinder, N/2 outside 89

  54. Composing a circle � N � y � ≥ 𝑂? ��� N/2 • The circle net – Very large number of neurons – Sum is N inside the circle, N/2 outside almost everywhere – Circle can be at any location 90

  55. Composing a circle 𝑶 − 𝑶 N/2 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 −𝑂/2 1 0 • The circle net – Very large number of neurons – Sum is N/2 inside the circle, 0 outside almost everywhere – Circle can be at any location 91

  56. Adding circles 𝟑𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • The “sum” of two circles sub nets is exactly N/2 inside either circle, and 0 almost everywhere outside 92

  57. Composing an arbitrary figure 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • Just fit in an arbitrary number of circles – More accurate approximation with greater number of smaller circles – Can achieve arbitrary precision 93

  58. MLP: Universal classifier 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • MLPs can capture any classification boundary • A one-layer MLP can model any classification boundary • MLPs are universal classifiers 94

  59. Depth and the universal classifier x 2 x 1 x 1 x 2 • Deeper networks can require far fewer neurons 95

  60. Optimal depth.. • Formal analyses typically view these as category of arithmetic circuits – Compute polynomials over any field • Valiant et. al: A polynomial of degree n requires a network of depth � – Cannot be computed with shallower networks • Bengio et. al: Shows a similar result for sum-product networks – But only considers two-input units – Generalized by Mhaskar et al. to all functions that can be expressed as a binary tree – Depth/Size analyses of arithmetic circuits still a research problem 96

  61. Special case: Sum-product nets • “Shallow vs deep sum-product networks,” Oliver Dellaleau and Yoshua Bengio – For networks where layers alternately perform either sums or products, a deep network may require an exponentially fewer number of layers than a shallow one 97

  62. Depth in sum-product networks 98

  63. Optimal depth in generic nets • We look at a different pattern: – “worst case” decision boundaries • For threshold-activation networks – Generalizes to other nets 99

  64. Optimal depth 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • A one-hidden-layer neural network will required infinite hidden neurons 100

Recommend


More recommend