neural networks what can a network represent
play

Neural Networks: What can a network represent Deep Learning, Fall - PowerPoint PPT Presentation

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning Tasks that were once assumed to be purely in the human domain


  1. How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Expressed in disjunctive normal form 38

  2. How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Expressed in disjunctive normal form 39

  3. How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Any truth table can be expressed in this manner! • A one-hidden-layer MLP is a Universal Boolean Function 40

  4. How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Any truth table can be expressed in this manner! • A one-hidden-layer MLP is a Universal Boolean Function But what is the largest number of perceptrons required in the single hidden layer for an N-input-variable function? 41

  5. Reducing a Boolean Function YZ WX 00 01 11 10 This is a “Karnaugh Map” 00 It represents a truth table as a grid Filled boxes represent input combinations 01 for which output is 1; blank boxes have output 0 11 Adjacent boxes can be “grouped” to reduce the complexity of the DNF formula 10 for the table • DNF form: – Find groups – Express as reduced DNF 42

  6. Reducing a Boolean Function YZ WX 00 01 11 10 00 Basic DNF formula will require 7 terms 01 11 10 43

  7. Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 • Reduced DNF form: – Find groups – Express as reduced DNF 44

  8. Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 W X Y Z • Reduced DNF form: – Find groups – Express as reduced DNF – Boolean network for this function needs only 3 hidden units • Reduction of the DNF reduces the size of the one-hidden-layer network 45

  9. Largest irreducible DNF? YZ WX 00 01 11 10 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 46

  10. Largest irreducible DNF? YZ WX 00 01 11 10 Red=0, white=1 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 47

  11. Largest irreducible DNF? YZ How many neurons WX 00 01 11 10 in a DNF (one- 00 hidden-layer) MLP 01 for this Boolean 11 function? 10 • What arrangement of ones and zeros simply cannot be reduced further? 48

  12. Width of a one-hidden-layer Boolean MLP Red=0, white=1 YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function of 6 variables? 49

  13. Width of a one-hidden-layer Boolean MLP YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function 50

  14. Width of a one-hidden-layer Boolean MLP YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- How many units if we use multiple hidden layers? layer) MLP for this Boolean function 51

  15. Size of a deep MLP YZ WX 00 01 11 10 YZ WX 00 00 01 01 11 11 10 11 10 01 10 00 YZ 00 01 11 10 UV 52

  16. Multi-layer perceptron XOR X 1 1 1 -1 2 1 1 -1 -1 Y Hidden Layer • An XOR takes three perceptrons 53

  17. Size of a deep MLP YZ WX 00 01 11 10 00 01 9 perceptrons 11 10 W X Y Z • An XOR needs 3 perceptrons • This network will require 3x3 = 9 perceptrons 54

  18. Size of a deep MLP YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV 15 perceptrons U V W X Y Z • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons 55

  19. Size of a deep MLP YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV More generally, the XOR of N variables will require 3(N-1) U V W X Y Z perceptrons!! • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons 56

  20. One-hidden layer vs deep Boolean MLP YZ WX 00 Single hidden layer: Will require 2 N-1 +1 perceptrons in all (including output unit) 01 Exponential in N 11 10 11 10 01 Will require 3(N-1) perceptrons in a deep 00 YZ 00 01 11 10 UV network • How many neurons in a DNF (one-hidden- Linear in N!!! layer) MLP for this Boolean function Can be arranged in only 2log 2 (N) layers

  21. A better representation 𝑌 � 𝑌 � • Only layers – By pairing terms – 2 layers per XOR … 58

  22. A better representation XOR XOR XOR XOR 𝑌 � 𝑌 � • Only layers – By pairing terms – 2 layers per XOR … 59

  23. The challenge of depth …… 𝑎 � 𝑎 � 𝑌 � 𝑌 � • Using only K hidden layers will require O(2 CN ) neurons in the Kth layer, where �(���)/� – Because the output can be shown to be the XOR of all the outputs of the K-1th hidden layer – I.e. reducing the number of layers below the minimum will result in an exponentially sized network to express the function fully – A network with fewer than the minimum required number of neurons cannot model 60 the function

  24. The actual number of parameters in a network X 2 X 3 X 4 X 5 X 1 • The actual number of parameters in a network is the number of connections – In this example there are 30 • This is the number that really matters in software or hardware implementations • Networks that require an exponential number of neurons will require an exponential number of weights.. 61

  25. Recap: The need for depth • Deep Boolean MLPs that scale linearly with the number of inputs … • … can become exponentially large if recast using only one hidden layer • It gets worse.. 62

  26. The need for depth a b c d e f X 2 X 3 X 4 X 5 X 1 • The wide function can happen at any layer • Having a few extra layers can greatly reduce network size 63

  27. Depth vs Size in Boolean Circuits • The XOR is really a parity problem • Any Boolean parity circuit of depth using AND,OR and NOT gates with unbounded fan-in must have size – Parity, Circuits, and the Polynomial-Time Hierarchy, M. Furst, J. B. Saxe, and M. Sipser, Mathematical Systems Theory 1984 – Alternately stated: • Set of constant-depth polynomial size circuits of unbounded fan-in elements 64

  28. Caveat 1: Not all Boolean functions.. • Not all Boolean circuits have such clear depth-vs-size tradeoff • Shannon’s theorem: For , there is a Boolean function of variables that requires at least Boolean gates – More correctly, for large , almost all n -input Boolean functions need more than � Boolean gates • Regardless of depth • Note: If all Boolean functions over inputs could be computed using a circuit of size that is polynomial in , P = NP! 65

  29. Network size: summary • An MLP is a universal Boolean function • But can represent a given function only if – It is sufficiently wide – It is sufficiently deep – Depth can be traded off for (sometimes) exponential growth of the width of the network • Optimal width and depth depend on the number of variables and the complexity of the Boolean function – Complexity: minimal number of terms in DNF formula to represent it 66

  30. Story so far • Multi-layer perceptrons are Universal Boolean Machines • Even a network with a single hidden layer is a universal Boolean machine – But a single-layer network may require an exponentially large number of perceptrons • Deeper networks may require far fewer neurons than shallower networks to express the same function – Could be exponentially smaller 67

  31. Caveat 2 • Used a simple “Boolean circuit” analogy for explanation • We actually have threshold circuit (TC) not, just a Boolean circuit (AC) – Specifically composed of threshold gates • More versatile than Boolean gates (can compute majority function) – E.g. “at least K inputs are 1” is a single TC gate, but an exponential size AC – For fixed depth, 𝐶𝑝𝑝𝑚𝑓𝑏𝑜 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 ⊂ 𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 (strict subset) � weights – A depth-2 TC parity circuit can be composed with • But a network of depth log (𝑜) requires only 𝒫 𝑜 weights – But more generally, for large , for most Boolean functions, a threshold circuit that is polynomial in at optimal depth may become exponentially large at • Other formal analyses typically view neural networks as arithmetic circuits – Circuits which compute polynomials over any field • So let’s consider functions over the field of reals 68

  32. Today • Multi-layer Perceptrons as universal Boolean functions – The need for depth • MLPs as universal classifiers – The need for depth • MLPs as universal approximators • A discussion of optimal depth and width • Brief segue: RBF networks 69

  33. Recap: The MLP as a classifier 2 784 dimensions (MNIST) 784 dimensions • MLP as a function over real inputs • MLP as a function that finds a complex “decision boundary” over a space of reals 70

  34. A Perceptron on Reals x 1 1 x 2 x 3 w 1 x 1 +w 2 x 2 =T x 2 0 x N x 1 � � � x 2 • A perceptron operates on x 1 real- valued vectors – This is a linear classifier 71

  35. Boolean functions with a real perceptron 1,1 1,1 1,1 0,1 0,1 0,1 X X Y Y Y X 0,0 1,0 0,0 1,0 0,0 1,0 • Boolean perceptrons are also linear classifiers – Purple regions are 1 72

  36. Composing complicated “decision” boundaries Can now be composed into x 2 “networks” to compute arbitrary classification “boundaries” x 1 • Build a network of units with a single output that fires if the input is in the coloured area 73

  37. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 74

  38. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 75

  39. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 76

  40. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 77

  41. Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 78

  42. Booleans over the reals 3 � � x 2 x 2 4 ��� 4 AND 3 3 5 y 1 y 2 y 3 y 4 y 5 x 1 x 1 4 4 3 3 4 x 2 x 1 • The network must fire if the input is in the coloured area 79

  43. More complex decision boundaries OR AND AND x 2 x 1 x 2 x 1 • Network to fire if the input is in the yellow area – “OR” two polygons – A third layer is required 80

  44. Complex decision boundaries • Can compose arbitrarily complex decision boundaries 81

  45. Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries 82

  46. Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries – With only one hidden layer! – How ? 83

  47. Exercise: compose this with one hidden layer x 2 x 1 x 2 x 1 • How would you compose the decision boundary to the left with only one hidden layer? 84

  48. Composing a Square decision boundary 2 2 2 4 2 � � y � ≥ 4? ��� • The polygon net y 1 y 2 y 3 y 4 x 2 x 1 85

  49. Composing a pentagon 2 2 3 4 4 3 3 5 4 4 2 4 2 3 3 � � y � ≥ 5? ��� 2 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 86

  50. Composing a hexagon 3 4 3 3 5 5 5 6 4 4 5 5 5 3 3 4 4 3 � � y � ≥ 6? ��� y 6 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 87

  51. How about a heptagon • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. • N is the number of sides of the polygon 88

  52. 16 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 89

  53. 64 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 90

  54. 1000 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 91

  55. Polygon net � � y � ≥ 𝑂? ��� y 1 y 2 y 3 y 4 y 5 x 2 x 1 • Increasing the number of sides reduces the area outside the polygon that have � � � � 92

  56. In the limit � � y � ≥ 𝑂? ��� y 1 y 2 y 3 y 4 y 5 x 2 x 1 N N/2 � ������ • � � � 𝐲������� – Value of the sum at the output unit, as a function of distance from center, as N increases • For small radius, it’s a near perfect cylinder – N in the cylinder, N/2 outside 93

  57. Composing a circle � N � y � ≥ 𝑂? ��� N/2 • The circle net – Very large number of neurons – Sum is N inside the circle, N/2 outside almost everywhere – Circle can be at any location 94

  58. Composing a circle 𝑶 − 𝑶 N/2 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 −𝑂/2 1 0 • The circle net – Very large number of neurons – Sum is N/2 inside the circle, 0 outside almost everywhere – Circle can be at any location 95

  59. Adding circles 𝟑𝑶 − 𝑶 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 • The “sum” of two circles sub nets is exactly N/2 inside either circle, and 0 almost everywhere outside 96

  60. Composing an arbitrary figure 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 • Just fit in an arbitrary number of circles – More accurate approximation with greater number of smaller circles – Can achieve arbitrary precision 97

  61. MLP: Universal classifier 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 • MLPs can capture any classification boundary • A one-hidden-layer MLP can model any classification boundary • MLPs are universal classifiers 98

  62. Depth and the universal classifier x 2 x 1 x 1 x 2 • Deeper networks can require far fewer neurons 99

  63. Optimal depth.. • Formal analyses typically view these as category of arithmetic circuits – Compute polynomials over any field • Valiant et. al: A polynomial of degree n requires a network of � depth – Cannot be computed with shallower networks – The majority of functions are very high (possibly ∞ ) order polynomials • Bengio et. al: Shows a similar result for sum-product networks – But only considers two-input units – Generalized by Mhaskar et al. to all functions that can be expressed as a binary tree – Depth/Size analyses of arithmetic circuits still a research problem 100

Recommend


More recommend