Neural Networks: What can a network represent Deep Learning, Spring - PowerPoint PPT Presentation

How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Any truth table can be expressed in this manner! • A one-hidden-layer MLP is a Universal Boolean Function But what is the largest number of perceptrons required in the single hidden layer for an N-input-variable function? 37

Reducing a Boolean Function YZ WX 00 01 11 10 This is a “Karnaugh Map” 00 It represents a truth table as a grid Filled boxes represent input combinations 01 for which output is 1; blank boxes have output 0 11 Adjacent boxes can be “grouped” to reduce the complexity of the DNF formula 10 for the table • DNF form: – Find groups – Express as reduced DNF 38

Reducing a Boolean Function YZ WX 00 01 11 10 00 Basic DNF formula will require 7 terms 01 11 10 39

Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 • Reduced DNF form: – Find groups – Express as reduced DNF 40

Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 W X Y Z • Reduced DNF form: – Find groups – Express as reduced DNF 41

Largest irreducible DNF? YZ WX 00 01 11 10 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 42

Largest irreducible DNF? YZ WX 00 01 11 10 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 43

Largest irreducible DNF? YZ How many neurons WX 00 01 11 10 in a DNF (one- 00 hidden-layer) MLP 01 for this Boolean 11 function? 10 • What arrangement of ones and zeros simply cannot be reduced further? 44

Width of a single-layer Boolean network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function of 6 variables? 45

The actual number of parameters in a network X 2 X 3 X 4 X 5 X 1 • The actual number of parameters in a network is the number of connections – In this example there are 30 • This is the number that really matters in software or hardware implementations 46

Width of a single-layer Boolean network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden-layer) MLP for this Boolean function of 6 variables? – How many weights will this network require? 47

Width of a single-layer Boolean network YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 00 YZ Will require O(N2 N-1 ) weights 00 01 11 10 UV superexponential in N • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function 48

Width of a single-layer Boolean network YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 How many units if we use multiple layers? 00 YZ 00 01 11 10 UV How many weights? • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function 49

Width of a deep network YZ WX 00 01 11 10 YZ WX 00 00 01 01 11 11 10 11 10 01 10 00 YZ 00 01 11 10 UV 50

Multi-layer perceptron XOR X 1 1 1 -1 2 1 1 -1 -1 Y Hidden Layer • An XOR takes three perceptrons – 6 weights and three threshold values • 9 total parameters 51

Width of a deep network YZ WX 00 01 11 10 00 01 9 perceptrons 11 10 W X Y Z • An XOR needs 3 perceptrons • This network will require 3x3 = 9 perceptrons – 27 parameters 52

Width of a deep network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV 15 perceptrons U V W X Y Z • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons – 45 parameters 53

Width of a deep network YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV More generally, the XOR of N variables will require 3(N-1) U V W X Y Z perceptrons (and 9(N-1) weights) • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons – 45 weights 54

Width of a single-layer Boolean network YZ WX 00 Single hidden layer: Will require 2 N-1 +1 perceptrons in all (including output unit) 01 Exponential in N 11 10 11 10 01 Will require 3(N-1) perceptrons in a deep 00 YZ 00 01 11 10 UV network (with 9(N-1) parameters) • How many neurons in a DNF (one-hidden- Linear in N!!! layer) MLP for this Boolean function Can be arranged in only 2log 2 (N) layers 55

A better representation 𝑌 � 𝑌 � • Only layers – By pairing terms – 2 layers per XOR … 56

The challenge of depth …… 𝑎 � 𝑎 � 𝑌 � 𝑌 � • Using only K hidden layers will require O(2 CN ) neurons in the Kth layer, where ��/� – Because the output can be shown to be the XOR of all the outputs of the K-1th hidden layer – I.e. reducing the number of layers below the minimum will result in an exponentially sized network to express the function fully – A network with fewer than the minimum required number of neurons cannot model 57 the function

Recap: The need for depth • Deep Boolean MLPs that scale linearly with the number of inputs … • … can become exponentially large if recast using only one layer • It gets worse.. 58

The need for depth a b c d e f X 2 X 3 X 4 X 5 X 1 • The wide function can happen at any layer • Having a few extra layers can greatly reduce network size 59

Depth vs Size in Boolean Circuits • The XOR is really a parity problem • Any Boolean circuit of depth using AND,OR and NOT gates with unbounded fan-in must have size – Parity, Circuits, and the Polynomial-Time Hierarchy, M. Furst, J. B. Saxe, and M. Sipser, Mathematical Systems Theory 1984 – Alternately stated: • Set of constant-depth polynomial size circuits of unbounded fan-in elements 60

Caveat 1: Not all Boolean functions.. • Not all Boolean circuits have such clear depth-vs-size tradeoff • Shannon’s theorem: For , there is Boolean function of variables that requires at least gates – More correctly, for large ,almost all n -input Boolean functions need more than � gates • Note: If all Boolean functions over inputs could be computed using a circuit of size that is polynomial in , P = NP! 61

Network size: summary • An MLP is a universal Boolean function • But can represent a given function only if – It is sufficiently wide – It is sufficiently deep – Depth can be traded off for (sometimes) exponential growth of the width of the network • Optimal width and depth depend on the number of variables and the complexity of the Boolean function – Complexity: minimal number of terms in DNF formula to represent it 62

Story so far • Multi-layer perceptrons are Universal Boolean Machines • Even a network with a single hidden layer is a universal Boolean machine – But a single-layer network may require an exponentially large number of perceptrons • Deeper networks may require far fewer neurons than shallower networks to express the same function – Could be exponentially smaller 63

Caveat 2 • Used a simple “Boolean circuit” analogy for explanation • We actually have threshold circuit (TC) not, just a Boolean circuit (AC) – Specifically composed of threshold gates • More versatile than Boolean gates – E.g. “at least K inputs are 1” is a single TC gate, but an exponential size AC – For fixed depth, 𝐶𝑝𝑝𝑚𝑓𝑏𝑜 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 ⊂ 𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 (strict subset) � weights – A depth-2 TC parity circuit can be composed with • But a network of depth log (𝑜) requires only 𝒫 𝑜 weights – But more generally, for large , for most Boolean functions, a threshold circuit that is polynomial in at optimal depth becomes exponentially large at • Other formal analyses typically view neural networks as arithmetic circuits – Circuits which compute polynomials over any field • So lets consider functions over the field of reals 64

Today • Multi-layer Perceptrons as universal Boolean functions – The need for depth • MLPs as universal classifiers – The need for depth • MLPs as universal approximators • A discussion of optimal depth and width • Brief segue: RBF networks 65

The MLP as a classifier 2 784 dimensions (MNIST) 784 dimensions • MLP as a function over real inputs • MLP as a function that finds a complex “decision boundary” over a space of reals 66

A Perceptron on Reals x 1 1 x 2 x 3 w 1 x 1 +w 2 x 2 =T x 2 0 x N x 1 � � � x 2 • A perceptron operates on x 1 real- valued vectors – This is a linear classifier 67

Boolean functions with a real perceptron 1,1 1,1 1,1 0,1 0,1 0,1 X X Y Y Y X 0,0 1,0 0,0 1,0 0,0 1,0 • Boolean perceptrons are also linear classifiers – Purple regions are 1 68

Composing complicated “decision” boundaries Can now be composed into x 2 “networks” to compute arbitrary classification “boundaries” x 1 • Build a network of units with a single output that fires if the input is in the coloured area 69

Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 70

Booleans over the reals 3 � � x 2 x 2 4 �� 4 AND 3 3 5 y 1 y 2 y 3 y 4 y 5 x 1 x 1 4 4 3 3 4 x 2 x 1 • The network must fire if the input is in the coloured area 75

More complex decision boundaries OR AND AND x 2 x 1 x 2 x 1 • Network to fire if the input is in the yellow area – “OR” two polygons – A third layer is required 76

Complex decision boundaries • Can compose arbitrarily complex decision boundaries 77

Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries 78

Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries – With only one hidden layer! – How ? 79

Exercise: compose this with one hidden layer x 2 x 1 x 2 x 1 • How would you compose the decision boundary to the left with only one hidden layer? 80

Composing a Square decision boundary 2 2 2 4 2 � � y � ≥ 4? �� • The polygon net y 1 y 2 y 3 y 4 x 2 x 1 81

Composing a pentagon 2 2 3 4 4 3 3 5 4 4 2 4 2 3 3 � � y � ≥ 5? �� 2 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 82

Composing a hexagon 3 4 3 3 5 5 5 6 4 4 5 5 5 3 3 4 4 3 � � y � ≥ 6? �� y 6 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 83

How about a heptagon • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 84

16 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 85

Polygon net � � y � ≥ 𝑂? �� y 1 y 2 y 3 y 4 y 5 x 2 x 1 • Increasing the number of sides reduces the area outside the polygon that have N/2 < Sum < N 88

In the limit � � y � ≥ 𝑂? �� y 1 y 2 y 3 y 4 y 5 x 2 x 1 N N/2 � �� • � � � 𝐲�� • For small radius, it’s a near perfect cylinder – N in the cylinder, N/2 outside 89

Composing a circle � N � y � ≥ 𝑂? �� N/2 • The circle net – Very large number of neurons – Sum is N inside the circle, N/2 outside almost everywhere – Circle can be at any location 90

Composing a circle 𝑶 − 𝑶 N/2 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 −𝑂/2 1 0 • The circle net – Very large number of neurons – Sum is N/2 inside the circle, 0 outside almost everywhere – Circle can be at any location 91

Adding circles 𝟑𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • The “sum” of two circles sub nets is exactly N/2 inside either circle, and 0 almost everywhere outside 92

Composing an arbitrary figure 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • Just fit in an arbitrary number of circles – More accurate approximation with greater number of smaller circles – Can achieve arbitrary precision 93

MLP: Universal classifier 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • MLPs can capture any classification boundary • A one-layer MLP can model any classification boundary • MLPs are universal classifiers 94

Depth and the universal classifier x 2 x 1 x 1 x 2 • Deeper networks can require far fewer neurons 95

Optimal depth.. • Formal analyses typically view these as category of arithmetic circuits – Compute polynomials over any field • Valiant et. al: A polynomial of degree n requires a network of depth � – Cannot be computed with shallower networks • Bengio et. al: Shows a similar result for sum-product networks – But only considers two-input units – Generalized by Mhaskar et al. to all functions that can be expressed as a binary tree – Depth/Size analyses of arithmetic circuits still a research problem 96

Special case: Sum-product nets • “Shallow vs deep sum-product networks,” Oliver Dellaleau and Yoshua Bengio – For networks where layers alternately perform either sums or products, a deep network may require an exponentially fewer number of layers than a shallow one 97

Depth in sum-product networks 98

Optimal depth in generic nets • We look at a different pattern: – “worst case” decision boundaries • For threshold-activation networks – Generalizes to other nets 99

Optimal depth 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 > 𝟏? 𝒋�𝟐 • A one-hidden-layer neural network will required infinite hidden neurons 100

Neural Networks: What can a network represent Deep Learning, Spring - PowerPoint PPT Presentation

Neural Networks: What can a network represent Deep Learning, Spring 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain In their basic form, NNets

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Key Considerations When Developing Statewide CSA Policies December 17, 2019

Universal lower bounds for potential energy of spherical codes Doug Hardin (Vanderbilt

Polar Coding Part 2 - Construction and Performance Erdal Arkan Electrical-Electronics

8 Marketing From Code to Product gidgreen.com/course Definition Marketing is the

From Bits to Qubits Saikat Guha Optical and Quantum Communications Group, RLE, MIT Optical and

Indirect Cost Rates Audio is only available by conference call Please call: 844-291-5491

Universal lower bounds for the energy of spherical codes: lifting the Levenshtein framework P.

Content Blockchain & ISCC Sebastian Posth @posth https://posth.me Sebastian Posth CEO

Neural Networks: What can a network represent Deep Learning, Spring - PowerPoint PPT Presentation

Neural Networks: What can a network represent Deep Learning, Spring 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain In their basic form, NNets

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Key Considerations When Developing Statewide CSA Policies December 17, 2019

Universal lower bounds for potential energy of spherical codes Doug Hardin (Vanderbilt

Polar Coding Part 2 - Construction and Performance Erdal Arkan Electrical-Electronics

8 Marketing From Code to Product gidgreen.com/course Definition Marketing is the

From Bits to Qubits Saikat Guha Optical and Quantum Communications Group, RLE, MIT Optical and

Indirect Cost Rates Audio is only available by conference call Please call: 844-291-5491

Universal lower bounds for the energy of spherical codes: lifting the Levenshtein framework P.

Content Blockchain &amp; ISCC Sebastian Posth @posth https://posth.me Sebastian Posth CEO

Content Blockchain & ISCC Sebastian Posth @posth https://posth.me Sebastian Posth CEO