CSCI 447/547 MACHINE LEARNING Outline Introduction Sequence Data - PowerPoint PPT Presentation

Recurrent Neural Networks CSCI 447/547 MACHINE LEARNING

Outline  Introduction  Sequence Data  Sequential Memory  Recurrent Neural Networks  Vanishing Gradient  LSTMs and GRUs

Introduction  Uses:  Speech Recognition  Language Translation  Stock Prediction  Video  Weather  Incorporate internal memory  Used when “temporal dynamics that connects the data is more important that the spatial context of an individual frame” (Lex Fridman, MIT)

Sequence Data  Snapshot of a ball moving in time:  You want to predict the direction it is moving  With the data you have, it would be a random guess

Sequence Data  Snapshots of a ball moving in time:  You want to predict the direction it is moving  Now with the data you have about previous positions, you can predict more accurately

Sequence Data  Audio:  Text Messaging:  I want to say hi that I

Sequential Memory  Try saying the alphabet forward  Now try saying it backwards  Now say it forward, but start at the letter F  Sequential memory makes it easier for your brain to recognize sequence patterns

Recurrent Neural Networks  Feed Forward Neural  Recurrent Neural Network Network Input information never Input information cycles touches a node twice through a loop

Recurrent Neural Networks  Hidden state is retained and used as input in subsequent iterations

Recurrent Neural Networks  Another view

Language Models  Word ordering:  the cat is small vs. small is the cat  Word choice:  walking home after school vs. walking house after school  An incorrect but necessary Markov assumption:

Recurrent Neural Networks

Recurrent Neural Networks  Forward propagation:

Recurrent Neural Networks  Use same weights at each time step  Condition network on all previous inputs  RAM requirement scales with number of words, not number of combinations of words (n-grams)

Recurrent Neural Networks

Back Propagation Through Time (BPTT)  Back propagation on an unrolled recurrent neural network  Unrolling is a conceptual tool  View the RNN as a sequence of ANNs that you train one after the other

Vanishing Gradient  AKA Short Term Memory  Due to the nature of back propagation  If the adjustments to a layer before the current one is small, the adjustments to the current layer will be smaller  Gradient shrinks exponentially  Back propagation through time (BPTT)  Gradient shrinks exponentially through each time step

LSTMs and GRUs  LSTM – Long Short-Term Memory  Information is retained in memory  LSTM can read, write and delete this memory  GRU – Gated Recurrent Units  Gates decide whether to store or delete information  Based on importance assigned  Assigning importance based on weights  Both can learn what information to add or remove in a hidden state

LSTMs and GRUs  Three gates:  Input  Let new input in  Forget  Delete information that isn’t important  Output  Let information impact current output

LSTMs and GRUs  Gates are analog – often sigmoid – ranging from 0 to 1  Can back propagate with them

Bidirectional RNNs

Deep Bidirectional RNNs

Summary  Introduction  Sequence Data  Sequential Memory  Recurrent Neural Networks  Vanishing Gradient  LSTMs and GRUs

CSCI 447/547 MACHINE LEARNING Outline Introduction Sequence Data - PowerPoint PPT Presentation

Recurrent Neural Networks CSCI 447/547 MACHINE LEARNING Outline Introduction Sequence Data Sequential Memory Recurrent Neural Networks Vanishing Gradient LSTMs and GRUs Introduction Uses: Speech Recognition

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Wangara wind profiles showing log-layer Atm S 547 Lecture 5, Slide 1 Roughness length vs.

CSE 447/547 Natural Language Processing Winter 2020 Language Models Yejin Choi Slides adapted

CSE 447/547 Natural Language Processing Winter 2018 Dependency Parsing And Other Grammar

CSE 447/547: Natural Language Processing Deep Learning Winter 2018 Yejin Choi University of

We have a sitting situation 447 enrollment: 67 out of 64 547 enrollment: 10 out of 10 2

CSE 447/547 Natural Language Processing Winter 2018 Feature Rich Models (Log Linear Models)

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of

CSE 447 / 547 Natural Language Processing Winter 2018 Hidden Markov Models Yejin Choi

CSE 447/547 Natural Language Processing Winter 2018 Frame Semantics Yejin Choi Some slides

Schematic energy balance of ideal surface R N = H S + H L + H G Atm S 547 Lecture 10, Slide 1

BL diurnal cycle over land Atm S 547 Lecture 12, Slide 1 Diurnal variation of temperature over

Convective BL profiles Atm S 547 Lecture 4, Slide 1 Moderately stable BL profiles Atm S 547

Potential evaporation vs. available heat flux R N - H G Atm S 547 Lecture 11, Slide 1

Stability functions inferred from Kansas experiment Atm S 547 Lecture 6, Slide 1 K m,h vs.

Stable BL features Katabatic downslope flow Atm S 547 Lecture 12, Slide 1 Nocturnal jet

Lecture 6: RNN wrap-up Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office

Bo Border der Hi High gher Educ ucat ation on: Ch Challenge nges s fo for GCC CC Dr

On Type-holding and type-repelling lambda-term skeletons, with applications to all-term and

Beyond the rainbow LEGATO, GALA Choruses and Fruitvox London

ProverBot9000 A proof assistant assistant Proofs are hard Proof assistants are hard Big Idea:

NLP with recurrent networks Chapter 9 in Martin/Jurafsky Feed-forward networks for text

Neural Network Training: Old & New Tricks Old: (80s) Stochastic Gradient Descent,

About me... Musician, Electrical Engineer, Mixing/Mastering Engineer Studied audio DSP at

CSCI 447/547 MACHINE LEARNING Outline Introduction Sequence Data - PowerPoint PPT Presentation

Recurrent Neural Networks CSCI 447/547 MACHINE LEARNING Outline Introduction Sequence Data Sequential Memory Recurrent Neural Networks Vanishing Gradient LSTMs and GRUs Introduction Uses: Speech Recognition

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Wangara wind profiles showing log-layer Atm S 547 Lecture 5, Slide 1 Roughness length vs.

CSE 447/547 Natural Language Processing Winter 2020 Language Models Yejin Choi Slides adapted

CSE 447/547 Natural Language Processing Winter 2018 Dependency Parsing And Other Grammar

CSE 447/547: Natural Language Processing Deep Learning Winter 2018 Yejin Choi University of

We have a sitting situation 447 enrollment: 67 out of 64 547 enrollment: 10 out of 10 2

CSE 447/547 Natural Language Processing Winter 2018 Feature Rich Models (Log Linear Models)

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of

CSE 447 / 547 Natural Language Processing Winter 2018 Hidden Markov Models Yejin Choi

CSE 447/547 Natural Language Processing Winter 2018 Frame Semantics Yejin Choi Some slides

Schematic energy balance of ideal surface R N = H S + H L + H G Atm S 547 Lecture 10, Slide 1

BL diurnal cycle over land Atm S 547 Lecture 12, Slide 1 Diurnal variation of temperature over

Convective BL profiles Atm S 547 Lecture 4, Slide 1 Moderately stable BL profiles Atm S 547

Potential evaporation vs. available heat flux R N - H G Atm S 547 Lecture 11, Slide 1

Stability functions inferred from Kansas experiment Atm S 547 Lecture 6, Slide 1 K m,h vs.

Stable BL features Katabatic downslope flow Atm S 547 Lecture 12, Slide 1 Nocturnal jet

Lecture 6: RNN wrap-up Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office

Bo Border der Hi High gher Educ ucat ation on: Ch Challenge nges s fo for GCC CC Dr

On Type-holding and type-repelling lambda-term skeletons, with applications to all-term and

Beyond the rainbow LEGATO, GALA Choruses and Fruitvox London

ProverBot9000 A proof assistant assistant Proofs are hard Proof assistants are hard Big Idea:

NLP with recurrent networks Chapter 9 in Martin/Jurafsky Feed-forward networks for text

Neural Network Training: Old &amp; New Tricks Old: (80s) Stochastic Gradient Descent,

About me... Musician, Electrical Engineer, Mixing/Mastering Engineer Studied audio DSP at

Neural Network Training: Old & New Tricks Old: (80s) Stochastic Gradient Descent,