finding structure in time
play

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, - PowerPoint PPT Presentation

Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu) Outline Motivation Method Experiments Exclusive-Or Structure in Letter Sequences


  1. Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 – 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu)

  2. Outline • Motivation • Method • Experiments • Exclusive-Or • Structure in Letter Sequences • Discovering the Notion “Word” • Discovering Lexical Classes • Conclusions

  3. Motivation: The Problem with Time • Previous methods of representing time • Associate serial order of temporal pattern with dimensionality of pattern vector • [ 0 1 0 0 1 ] <- first, second, third... event in temporal order • There are several downsides of presenting time this way • Input buffer is required to represent events all at once • All input vectors must be the same length and provide for the longest possible temporal pattern • Most importantly: Cannot distinguish relative from absolute temporal position [ 0 1 1 1 0 0 0 0 0 ] [ 0 0 0 1 1 1 0 0 0 ]

  4. An Alternative Way of Treating Time • Don’t model time as an explicit part of the input • Allow time to be represented by the effect it has on processing • Networks allows hidden units to see previous output • Recurrent connections are what give the network memory

  5. Approach: Recurrent Neural Network • Argument input with additional units (context units) • When input is processed sequentially, the context units contain the exact values of the hidden units of the previous sequence • The hidden units map the external input and previous internal state to desired output

  6. Exclusive-OR • XOR function cannot be learned by a simple two-layer network • Temporal XOR: One input bit is presented at a time, predict next bit • Input: 1 0 1 0 0 0 • output: 0 1 0 0 0 ? • Training: Run 600 passes through a 3,000 bit XOR sequence

  7. Exclusive-OR (cont.) • It is only sometimes possible to predict the next bit correctly • After one bit, there is a 50/50 chance • After two bits, the third bit will be the XOR of the first and second

  8. Structure in Letter Sequences • Idea: Extend prediction from one bit vectors to more complex predictions (multi-bit) • Method: • map six letters to a binary representation (b, d, g, a, i, u) • Use three consonants to create a random 1,000 letter sequence • Replace each consonant by adding vowels: b -> ba; d -> dii; g -> guuu • Example input: dbgbddg -> diibaguuubadiidiiguuu • Prediction task: given the bit representations of characters in sequence, predict the character word

  9. Structure in Letter Sequences (cont.) • Since consonants where ordered randomly there is high error • Vowels are not random, therefore the network can make use of previous information. Thus, error is low. • Takeaway: Since the input is structured the network can make partial predictions even where the complete prediction is not possible

  10. Discovering the Notion “Word” • Learning a language involves learning words • Can the network automatically learn “words”, when given a sequential list of concatenated characters? • Words are represented as concatenated bit vectors of their characters • These bit vectors are concatenated to form sentences • Then, each character is inputted sequentially and the network has to predict the following letter • input: manyyearsago • output: anyyearsago?

  11. Discovering the Notion “Word” (cont.) • At the onset of each word error is high • As more of the word is received, error declines • Error provides good clue as to what the recurring sequences in the input are and highly correlates with words • Network can learn boundaries of linguistic units from input signal

  12. Discovering Lexical Classes from Word Order • Can network learn the abstract structure that underlies sentences, when only the surface forms (i.e. words) are presented to it? • Method • Define a set of category-to-word mappings (e.g., NOUN-HUMAN -> man, woman; VERB-PERCEPTION -> smell, see) • Use templates to create sentences (e.g., NOUN-HUMAN, VERB-EAT, NOUN- FOOD) • Words in sentence (e.g., ”woman eat bread”) are mapped to one-hot- vectors (e.g. 00010 00100 10000 ) • Task: Given a word vector (“woman”) predict next word (“eat”).

  13. Discovering Lexical Classes (cont.) • Since prediction task is nondeterministic RMS error is not a fitting measurement • Save the hidden unit vectors for each word in all possible contexts and average over them • Perform hierarchical clustering • Similarity structure of internal representations is shown in tree

  14. Discovering Lexical Classes (cont.) • Network has developed internal representations for the input vectors which reflect facts about possible sequential ordering of inputs • Hidden unit patterns are not word representations in the conventional sense, since patterns also reflect prior context. • Error in predicting the actual next word in a given context is high, but the network is able to predict the approximate likelihood of occurrence of classes and words • A given node in hidden layer participates in multiple concepts. Only the activation pattern in its entirety is meaningful.

  15. Conclusions • Networks can learn temporal structure implicitly • Problems change their nature when expressed as temporal events (XOR could previously not be learned by single-layer network) • Error signal is a good metric of where structure exists (Error was high at the beginning of words in sentence) • Increasing complexity does not necessarily result in worse performance (Increasing number of bits did not hurt performance) • Internal representations can be hierarchical in nature (Similarity was high among words within one class)

  16. Finding Structure in Time Jeffrey L. Elman In Cognitive Science 14, 179 – 211 (1990) presented by Dominic Seyler (dseyler2@illinois.edu)

Recommend


More recommend