Articial Neural Net w orks [Read Ch. 4] [Recommended - PDF document

Arti�cial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9, 4.11] � Threshold units � Gradien t descen t � Multila y er net w orks � Bac kpropagation � Hidden la y er represen tations � Example: F ace Recognition � Adv anced topics 74 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Connectionist Mo dels Consider h umans: � Neuron switc hing time ~ : 001 second 10 � Num b er of neurons ~ 10 4 � 5 � Connections p er neuron ~ 10 � Scene recognition time ~ : 1 second � 100 inference steps do esn't seem lik e enough ! m uc h parallel computation Prop erties of arti�cial neural nets (ANN's): � Man y neuron-lik e threshold switc hing units � Man y w eigh ted in terconnections among units � Highly parallel, distributed pro cess � Emphasis on tuning w eigh ts automatically 75 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

When to Consider Neural Net w orks � Input is high-dimensional discrete or real-v alued (e.g. ra w sensor input) � Output is discrete or real v alued � Output is a v ector of v alues � P ossibly noisy data � F orm of target function is unkno wn � Human readabilit y of result is unimp ortan t Examples: � Sp eec h phoneme recognition [W aib el] � Image classi�cation [Kanade, Baluja, Ro wley] � Financial prediction 76 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

AL VINN driv es 70 mph on high w a ys Sharp Straight Sharp Left Ahead Right 30 Output Units 4 Hidden Units 77 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997 30x32 Sensor Input Retina

P erceptron 8 > > < 1 if w + w x + � � � + w x > 0 0 1 1 n n o ( x ; : : : ; x ) = 1 n > > : � 1 otherwise. x 1 w 1 x 0 =1 Sometimes w e'll use simpler v ector notation: w 0 x 2 w 2 8 > > < Σ 1 if w ~ � ~ x > 0 . o ( ~ x ) = AA > > : n . � 1 otherwise. Σ wi xi n Σ wi xi { . 1 if > 0 i =0 o = w n i =0 -1 otherwise x n 78 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Decision Surface of a P erceptron Represen ts some useful functions x2 x2 � What w eigh ts represen t + + g ( x ; x ) = AN D ( x ; x )? 1 2 1 2 - - + + But some functions not represen table x1 x1 - - + � e.g., not - linearly separable � Therefore, ( a ) w e'll w an t net w orks of these... ( b ) 79 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

P erceptron training rule w w + � w i i i where � w = � ( t � o ) x i i Where: � t = c ( ~ x ) is target v alue � o is p erceptron output � � is small constan t (e.g., .1) called le arning r ate 80 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

P erceptron training rule Can pro v e it will con v erge � If training data is linearly separable � and � su�cien tly small 81 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Gradien t Descen t T o understand, consider simpler line ar unit , where o = w + w x + � � � + w x 0 1 1 n n Let's learn w 's that minimize the squared error i 1 X 2 E [ w ~ ] � ( t � o ) d d 2 d 2 D Where D is set of training examples 82 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Gradien t Descen t Gradien t 25 2 3 @ E @ E @ E 6 7 20 4 5 r E [ w ~ ] � ; ; � � � @ w @ w @ w 0 1 n 15 E[w] T raining rule: 10 5 � w ~ = � � r E [ w ~ ] 0 i.e., 2 @ E 1 -2 � w = � � i -1 0 0 @ w i 1 2 -1 3 w1 w0 83 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Gradien t Descen t @ E @ 1 X 2 = ( t � o ) d d @ w @ w 2 d i i 1 @ X 2 = ( t � o ) d d 2 @ w d i 1 @ X = 2( t � o ) ( t � o ) d d d d 2 d @ w i @ X = ( t � o ) ( t � w ~ � x ~ ) d d d d @ w d i @ E X = ( t � o )( � x ) d d i;d @ w d i 84 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Gradien t Descen t Gradient-Descent ( tr aining exampl es; � ) Each tr aining example is a p air of the form h ~ x; t i , wher e ~ x is the ve ctor of input values, and t is the tar get output value. � is the le arning r ate (e.g., .05). � Initiali ze eac h w to some small random v alue i � Un til the termination condition is met, Do { Initiali ze eac h � w to zero. i { F or eac h h ~ x ; t i in tr aining exampl es , Do � Input the instance ~ x to the unit and compute the output o � F or eac h linear unit w eigh t w , Do i � w � w + � ( t � o ) x i i i { F or eac h linear unit w eigh t w , Do i w w + � w i i i 85 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Summary P erceptron training rule guaran teed to succeed if � T raining examples are linearly separable � Su�cien tly small learning rate � Linear unit training rule uses gradien t descen t � Guaran teed to con v erge to h yp othesis with minim um squared error � Giv en su�cien tly small learning rate � � Ev en when training data con tains noise � Ev en when training data not separable b y H 86 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Incremen tal (Sto c hastic) Gradien t Descen t Batc h mo de Gradien t Descen t: Do un til satis�ed 1. Compute the gradien t r E [ w ~ ] D 2. w ~ w ~ � � r E [ w ~ ] D Incremen tal mo de Gradien t Descen t: Do un til satis�ed � F or eac h training example d in D 1. Compute the gradien t r E [ w ~ ] d 2. w ~ w ~ � � r E [ w ~ ] d 1 X 2 E [ w ~ ] � ( t � o ) D d d 2 d 2 D 1 2 E [ w ~ ] � ( t � o ) d d d 2 Incr emental Gr adient Desc ent can appro ximate Batch Gr adient Desc ent arbitrarily closely if � made small enough 87 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Multila y er Net w orks of Sigmoid Units head hid who’d hood ... ... F1 F2 88 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Sigmoid Unit � ( x ) is the sigmoid function 1 � x 1 + e x1 w1 x0 = 1 A d� ( x ) Nice prop ert y: = � ( x )(1 � � ( x )) x2 w2 w0 dx A Σ . A W e can deriv e gradien t decen t rules to train . n net = Σ wi xi A 1 o = σ (net) = . i =0 � One sigmoid unit -net wn A 1 + e xn � Multilayer networks of sigmoid units ! Bac kpropagation 89 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Error Gradien t for a Sigmoid Unit @ E @ 1 X 2 = ( t � o ) d d @ w @ w 2 d 2 D i i 1 @ X 2 = ( t � o ) d d 2 d @ w i 1 @ X = 2( t � o ) ( t � o ) d d d d 2 d @ w i 0 1 @ o X d B C @ A = ( t � o ) � d d d @ w i @ o @ net X d d = � ( t � o ) d d d @ net @ w d i But w e kno w: @ o @ � ( net ) d d = = o (1 � o ) d d @ net @ net d d @ net @ ( w ~ � x ~ ) d d = = x i;d @ w @ w i i So: @ E X = � ( t � o ) o (1 � o ) x d d d d i;d @ w d 2 D i 90 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Bac kpropagation Algorithm Initiali ze all w eigh ts to small random n um b ers. Un til satis�ed, Do � F or eac h training example, Do 1. Input the training example to the net w ork and compute the net w ork outputs 2. F or eac h output unit k � o (1 � o )( t � o ) k k k k k 3. F or eac h hidden unit h X � o (1 � o ) w � h h h h;k k k 2 outputs 4. Up date eac h net w ork w eigh t w i;j w w + � w i;j i;j i;j where � w = � � x i;j j i;j 91 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Articial Neural Net w orks [Read Ch. 4] [Recommended - PDF document

Articial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9, 4.11] Threshold units Gradien t descen t Multila y er net w orks Bac kpropagation Hidden la y er represen

OPPORTUNITIES FOR Financial Literacy Education in Millennials with Arti fi cial

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales & Marketing

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data & Distributed

CS 337: Arti fi cial Intelligence & Machine Learning Instructor: Prof. Ganesh Ramakrishnan

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

BANKO BIKANO COMMUNITY LED SANITATION CAMPAIGN IN BIKANER Under Nirmal Bharat Abhiyan Arti Dogra

1 Monday Tuesday Wed. Thurs. Fri. Speci cial Education 1/2 1/2 Direct ctor Speci cial

MIGRA GRATION AND ND SO SOCI CIAL AL SECURI SECURITY MIGRA GRATION AND ND SO SOCI CIAL

The new EU sport policy: The new EU sport policy: The new EU sport policy: The new EU sport

www.FLgov.com/FBCB Spe peci cial al Th Than anks ks To To:

Introd u ction to Net w orks IN TR OD U C TION TO N E TW OR K AN ALYSIS IN P YTH ON Eric Ma

Neural Network LMs READ CHAPTERS 5 AND 7 IN JURAFSKY AND MARTIN READ CHAPTER 4 FROM YOAV

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

Bayesian Netw orks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You w ill be

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Bo osting Neural Net w orks pap er No Holger Sc h w enk LIMSICNRS

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

Greedy Layerwise Learning Can Scale to ImageNet Edouard Oyallon Eugene Belilovsky, Michael

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

Learning and transferring mid-level image representions using convolutional neural networks

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi , Emilio Incerto and Mirco

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Articial Neural Net w orks [Read Ch. 4] [Recommended - PDF document

Articial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9, 4.11] Threshold units Gradien t descen t Multila y er net w orks Bac kpropagation Hidden la y er represen

OPPORTUNITIES FOR Financial Literacy Education in Millennials with Arti fi cial

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales &amp; Marketing

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data &amp; Distributed

CS 337: Arti fi cial Intelligence &amp; Machine Learning Instructor: Prof. Ganesh Ramakrishnan

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

BANKO BIKANO COMMUNITY LED SANITATION CAMPAIGN IN BIKANER Under Nirmal Bharat Abhiyan Arti Dogra

1 Monday Tuesday Wed. Thurs. Fri. Speci cial Education 1/2 1/2 Direct ctor Speci cial

MIGRA GRATION AND ND SO SOCI CIAL AL SECURI SECURITY MIGRA GRATION AND ND SO SOCI CIAL

The new EU sport policy: The new EU sport policy: The new EU sport policy: The new EU sport

www.FLgov.com/FBCB Spe peci cial al Th Than anks ks To To:

Introd u ction to Net w orks IN TR OD U C TION TO N E TW OR K AN ALYSIS IN P YTH ON Eric Ma

Neural Network LMs READ CHAPTERS 5 AND 7 IN JURAFSKY AND MARTIN READ CHAPTER 4 FROM YOAV

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

Bayesian Netw orks Read R&amp;N Ch. 14.1-14.2 Next lecture: Read R&amp;N 18.1-18.4 You w ill be

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Bo osting Neural Net w orks pap er No Holger Sc h w enk LIMSICNRS

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

Greedy Layerwise Learning Can Scale to ImageNet Edouard Oyallon Eugene Belilovsky, Michael

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

Learning and transferring mid-level image representions using convolutional neural networks

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi , Emilio Incerto and Mirco

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Explain it like Im 5 AI, ML, NLP, and Deep Learning Kathryn Hume, Sales & Marketing

Digital Collections Customer Days 2017 Arti fj cial Intelligence, Semantic Data & Distributed

CS 337: Arti fi cial Intelligence & Machine Learning Instructor: Prof. Ganesh Ramakrishnan

Bayesian Netw orks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You w ill be