XOR with intermediate (hidden) units Delta rule as gradient descent - PowerPoint PPT Presentation

XOR with intermediate (“hidden”) units Delta rule as gradient descent in error (sigmoid units) � n j = a i w ij i Intermediate units can re-represent 1 input patterns as new patterns with a j = w ij t j 1 + exp ( − n j ) altered similarities a i → n j → a j → E 1 ( t j − a j ) 2 � Error E = Targets which are not linearly separable 2 in the input space can be linearly j separable in the intermediate − ǫ ∂ E Gradient descent: △ w ij = representational space ∂ w ij ∂ E ∂ E d a j ∂ n j Intermediate units are called “hidden” = because their activations are not ∂ w ij ∂ a j d n j ∂ w ij determined directly by the training = − ( t j − a j ) a j (1 − a j ) a i environment (inputs and targets) − ǫ ∂ E △ w ij = = ǫ ( t j − a j ) a j (1 − a j ) a i ∂ w ij 1 / 1 3 / 1 Generalized Delta rule (“back-propagation”) � = n j a i w ij i 1 = a j hidden output 1 + exp ( − n j ) w ij t j 1 n i → a i → n j → a j → E � ( t j − a j ) 2 Error E = 2 hidden output j − ǫ ∂ E Gradient descent: △ w ij = Hidden-to-output weights can be trained ∂ w ij with the Delta rule ∂ E ∂ E d a j Intermediate notation = = − ( t j − a j ) a j (1 − a j ) (“input derivatives” in Lens) ∂ n j ∂ a j d n j How can we train input-to-hidden weights? Hidden units do not have targets (for ∂ E ∂ E ∂ n j ∂ E = = a i determining error) ∂ w ij ∂ n j ∂ w ij ∂ n j Trick: We don’t need targets, we just ∂ E ∂ E ∂ n j ∂ E need to know how hidden activations � � = = w ij affect error (i.e., error derivatives) ∂ a i ∂ n j ∂ a i ∂ n j j j 2 / 1 4 / 1

Back-propagation Accelerating learning: Momentum descent Forward pass ( ⇑ ) Backward pass ( ⇓ ) 1 ∂ E a j = = − ( t j − a j ) 1 + exp ( − n j ) ∂ a j � ∂ E ∂ E n j = a i w ij = a j (1 − a j ) ∂ n j ∂ a j i ∂ E ∂ E = a i ∂ w ij ∂ n j 1 ∂ E ∂ E = a i � = w ij 1 + exp ( − n i ) ∂ a i ∂ n j j − ǫ ∂ E � △ w [ t − 1] � △ w ij [ t ] = + α ij ∂ w ij 5 / 1 7 / 1 What do hidden representations learn? “Auto-encoder” network (4–2–4) Plaut and Shallice (1993) Mapped orthography to semantics (unrelated similarities) Compared similarities among hidden representations to those among orthographic and semantic representations Hidden representations “split the (over settling) difference” between input and output similarity 6 / 1 8 / 1

Projections of error surface in weight space Epochs 3-4 Asterisk: error of current set of weights Tick mark: error of next set of weights Solid curve (0): Gradient direction Solid curve (21): Integrated gradient direction (including momentum) This is actual direction of weight step (tick mark is on this curve) Number is angle with gradient direction Dotted curves: Random directions (each labeled by angle with gradient direction) 9 / 1 11 / 1 Epochs 1-2 Epochs 5-6 10 / 1 12 / 1

Epochs 7-8 Epochs 25-50 13 / 1 15 / 1 Epochs 9-10 Epochs 75-end 14 / 1 16 / 1

High momentum (epochs 1-2) High learning rate (epochs 1-2) 17 / 1 19 / 1 High momentum (epochs 3-4) High learning rate (epochs 3-4) 18 / 1 20 / 1

XOR with intermediate (hidden) units Delta rule as gradient descent - PowerPoint PPT Presentation

XOR with intermediate (hidden) units Delta rule as gradient descent in error (sigmoid units) n j = a i w ij i Intermediate units can re-represent 1 input patterns as new patterns with a j = w ij t j 1 + exp ( n j ) altered

Delta highlighting Delta highlighting edits highlighted Delta highlighting edits highlighted

Sacramento Countys Policies and Response To The Delta Vision and Bay-Delta Conservation Plan

Array BP-XOR Codes for Reliable Cloud Storage Systems Yongge Wang UNC Charlotte, USA July

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

Quantum Algorithms for the k -xor Problem Lorenzo Grassi 1 , Mara Naya-Plasencia 2 , Andr

DELTA Microelectronics Microchip for every industry Make more with less DELTA / ASIC

Wind Turbine Noise Measurements in Practice Carsten Thomsen, DELTA, cth@delta.dk Simon Mller

Analyzing with P Analyzing with P Delta y y g g Delta Presenter: Presenter: Deborah

Phi Gamma Delta University of Tampa Phi Gamma Delta Delta Colony at UT 164 universities

Lectur Lecture 18: e 18: Electr Electrical Distr ical Distribution ibution Exam Example:

THE NATIONAL ECONOMY Source: Delta Associates; October 2018. THE NATIONAL ECONOMY Source: Delta

COMPANY BRIEF INTRODUCTION DELTA GROUP CURRENT REFERENCES SAUDI ARABIA / Delta Turkish

WEL ELCOME E to De Delta Desc scaling SOLUTIONS Before Delta T Descaling After Delta T

Delta Independent Science Board: Recommendation of the Delta Lead Scientist Dr. Elizabeth Canuel

Delta Scorpii Variability 2000-2002 Delta Hipparcos Primary June 2000 to October 2000 No

Open Enrollment for July 1, 2020 benefits Delta Dental Delta Vision Mutual of Omaha Voluntary

My typical workflow Jakub Muszy nski 6th7th May 2014 Computer Science and Communications

XML and Databases Chapter 3: Designing XML DTDs Prof. Dr. Stefan Brass Martin-Luther-Universit

Tools for the Digital Diplomatist Open source tools for online publication of charters Francesca

XML and databases (and XForms) Patryk Czarnik XML and Applications 2013/2014 Week 13

Java AND and OR Java XOR and NOT AND operator (&) OR

Neural Networks Module2 : learning with Gradient Descent module 2: numerical optimization

Optimizing linear maps modulo 2 (i.e.: fast xor sequences for bitsliced software) D. J.

Lower Bounds for Number-in-Hand Multiparty Communication Complexity Jeff M. Phillips Elad

XOR with intermediate (hidden) units Delta rule as gradient descent - PowerPoint PPT Presentation

XOR with intermediate (hidden) units Delta rule as gradient descent in error (sigmoid units) n j = a i w ij i Intermediate units can re-represent 1 input patterns as new patterns with a j = w ij t j 1 + exp ( n j ) altered

Delta highlighting Delta highlighting edits highlighted Delta highlighting edits highlighted

Sacramento Countys Policies and Response To The Delta Vision and Bay-Delta Conservation Plan

Array BP-XOR Codes for Reliable Cloud Storage Systems Yongge Wang UNC Charlotte, USA July

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

Quantum Algorithms for the k -xor Problem Lorenzo Grassi 1 , Mara Naya-Plasencia 2 , Andr

DELTA Microelectronics Microchip for every industry Make more with less DELTA / ASIC

Wind Turbine Noise Measurements in Practice Carsten Thomsen, DELTA, cth@delta.dk Simon Mller

Analyzing with P Analyzing with P Delta y y g g Delta Presenter: Presenter: Deborah

Phi Gamma Delta University of Tampa Phi Gamma Delta Delta Colony at UT 164 universities

Lectur Lecture 18: e 18: Electr Electrical Distr ical Distribution ibution Exam Example:

THE NATIONAL ECONOMY Source: Delta Associates; October 2018. THE NATIONAL ECONOMY Source: Delta

COMPANY BRIEF INTRODUCTION DELTA GROUP CURRENT REFERENCES SAUDI ARABIA / Delta Turkish

WEL ELCOME E to De Delta Desc scaling SOLUTIONS Before Delta T Descaling After Delta T

Delta Independent Science Board: Recommendation of the Delta Lead Scientist Dr. Elizabeth Canuel

Delta Scorpii Variability 2000-2002 Delta Hipparcos Primary June 2000 to October 2000 No

Open Enrollment for July 1, 2020 benefits Delta Dental Delta Vision Mutual of Omaha Voluntary

My typical workflow Jakub Muszy nski 6th7th May 2014 Computer Science and Communications

XML and Databases Chapter 3: Designing XML DTDs Prof. Dr. Stefan Brass Martin-Luther-Universit

Tools for the Digital Diplomatist Open source tools for online publication of charters Francesca

XML and databases (and XForms) Patryk Czarnik XML and Applications 2013/2014 Week 13

Java AND and OR Java XOR and NOT AND operator (&amp;) OR

Neural Networks Module2 : learning with Gradient Descent module 2: numerical optimization

Optimizing linear maps modulo 2 (i.e.: fast xor sequences for bitsliced software) D. J.

Lower Bounds for Number-in-Hand Multiparty Communication Complexity Jeff M. Phillips Elad

Java AND and OR Java XOR and NOT AND operator (&) OR