understanding machine learning with language and tensors
play

Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine Learning Language


  1. Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1

  2. Machine Learning Language Tensors Reduplication Thinking Like A Linguist 1 Language, like physics, is not just data you throw at a machine 2 Language is a fundamentally computational process, uniquely learned by humans from small data. 3 We can use core properties of language to understand how other systems generalize, learn, and perform inference. 2

  3. Machine Learning Language Tensors Reduplication 3

  4. Machine Learning Language Tensors Reduplication The Zipf Problem (Yang 2013) 4

  5. Machine Learning Language Tensors Reduplication A Recipe for Machine Learning 1 Given training data: { x i , y i } N i = 1 2 Choose each of these: ◮ Decision Function: ˆ y = f θ ( x i ) ◮ Loss Function: ℓ ( ˆ y , y i ) ∈ R 3 Define Goal: θ ∗ = argmin θ ∑ N i = 1 ℓ ( f θ ( x i ) , y i ) 4 Train (take small steps opposite the gradient): θ ( t + 1 ) = θ ( t ) − η t ∇ ℓ ( f θ ( x i ) , y i ) 5

  6. Machine Learning Language Tensors Reduplication “Neural” Networks & Automatic Differentiation p.c. Matt Gormley 6

  7. Machine Learning Language Tensors Reduplication Recurrent Neural Networks (RNN) Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back. p.c. Yoav Goldberg 7

  8. Machine Learning Language Tensors Reduplication Recurrent Neural Networks (RNN) Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back. p.c. Yoav Goldberg 7

  9. Machine Learning Language Tensors Reduplication What is a function for language? Alphabet : Σ = { a , b , c ,... } ◮ Examples: letters, DNA peptides, words, map directions, etc. Σ ∗ : all possible sequences (strings) using alphabet ◮ Examples: aaaaaaaaa, baba, bcabaca,... Languages: Subsets of Σ ∗ following some pattern ◮ Examples: ◮ {ba, baba, bababa, bababababa, ...}: 1 or more ba ◮ {ab, aabb, aaabbb, aaaaaabbbbbb,...}: a n b n ◮ {aa, aab, aba, aabbaabbaa,...}: Even # of a’s 8

  10. Machine Learning Language Tensors Reduplication What is a function for language? ◮ Grammar/Automaton: Computational device that decides whether a string is in a language (says yes/no) ◮ Functional perspective: f : Σ ∗ → { 0 , 1 } p.c. Casey 1996 9

  11. Machine Learning Language Tensors Reduplication Regular Languages & Finite-State Automata Regular Language: Memory required is finite w.r.t. input (ba)*: {ba, baba, bababa,...} b q 0 q 1 start a b(a*): {b, ba, baaaaaa,....} a b q 0 q 1 start 10

  12. Machine Learning Language Tensors Reduplication Regular Languages & Finite-State Automata f : Σ ∗ → R p.c. B. Balle, X. Carreras, A. Quattoni - ENMLP’14 tutorial 11

  13. Machine Learning Language Tensors Reduplication Supra-Regularity in Natural Language 12

  14. Machine Learning Language Tensors Reduplication Chomsky Hierarchy Swiss German English nested embedding Chumash sibilant harmony Shieber 1985 Chomsky 1957 Applegate 1972 Yoruba copying Kobele 2006 Mildly Context- Finite Regular Context-Free Context- Sensitive Sensitive English consonant clusters Kwakiutl stress Clements and Keyser 1983 Computably Enumerable Bach 1975 p.c. Rawski & Heinz 2019 13

  15. Machine Learning Language Tensors Reduplication Chomsky Hierarchy Swiss German English nested embedding Chumash sibilant harmony Shieber 1985 Chomsky 1957 Applegate 1972 Yoruba copying Kobele 2006 Mildly Context- Finite Regular Context-Free Context- Sensitive Sensitive English consonant clusters Kwakiutl stress Clements and Keyser 1983 Computably Enumerable Bach 1975 p.c. Rawski & Heinz 2019 13

  16. Machine Learning Language Tensors Reduplication Tensors: Quick and Dirty Overview ◮ Order 1 — vector: i − → v ∈ A = ∑ C v � a i i ◮ Order 2 — matrix: a i ⊗− → ij − → M ∈ A ⊗ B = ∑ C M b j ij ◮ Order 3 — Cuboid: a i ⊗− → ijk − → b j ⊗− → R ∈ A ⊗ B ⊗ C = ∑ C R c k ijk 14

  17. Machine Learning Language Tensors Reduplication Tensor Networks (Penrose Notation?) ( T × 1 A × 2 B × 3 C ) i 1 , i 2 , i 3 = ∑ k 1 k 2 k 3 T k 1 k 2 k 3 A i 1 k 1 B i 2 k 2 C i 3 k 3 p.c. Guillaume Rabusseau 15

  18. Machine Learning Language Tensors Reduplication Second-Order RNN Hidden state is computed by h t = g ( W × 2 x t × 3 h t − 1 ) The computation of a finite-state machine is very similar! where A ∈ R n × Σ × n defined by A : , σ , : = A σ p.c. Guillaume Rabusseau 16

  19. Machine Learning Language Tensors Reduplication Theorem (Rabusseau et al 2019) Weighted FSA are expressively equivalent to second-order linear RNNs (linear 2-RNNs) for computing functions over sequences of discrete symbols. Theorem (Merrill 2019) RNNs asymptotically accept exactly the regular languages Theorem (Casey 1996) A finite-dimensional RNN can robustly perform only finite-state computations. 17

  20. Machine Learning Language Tensors Reduplication Theorem (Casey 1996) An RNN with finite-state behavior necessarily partitions its state space into disjoint regions that correspond to the states of the minimal FSA 18

  21. Machine Learning Language Tensors Reduplication Analyzing Specific Neuron Dynamics ◮ RNN with only 2 neurons in its hidden state trained on “Even-A" language. ◮ Input: stream of strings separated by $ symbol ◮ Neuron 1: all even as, and $ symbol after a rejected string ◮ Neuron B: all b’s following even number of a’s, and $ after an accepted string. p.c. Oliva & Lago-Fernàndez 2019 19

  22. Machine Learning Language Tensors Reduplication But...Translation Needs an Output! f : Σ ∗ → ∆ ∗ p.c. Bahdanau et al 2014 20

  23. Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

  24. Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

  25. Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

  26. Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

  27. Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

  28. Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

  29. Machine Learning Language Tensors Reduplication Our idea: Use functions that copy! (1) Total reduplication = unbounded copy ( ∼ 83%) a. wanita → wanita ∼ wanita ‘woman’ → ‘women’ (Indonesian) (2) Partial reduplication = bounded copy ( ∼ 75%) a. C: gen → g ∼ gen ‘to sleep’ → ‘to be sleeping’ (Shilh) b. CV: guyon → gu ∼ guyon ‘to jest’ → ‘to jest repeatedly’ (Sundanese) c. CVC: takki → tak ∼ takki ‘leg’ → ‘legs’ (Agta) d. CVCV: banagañu → bana ∼ banagañu ‘return’ (Dyirbal) 22

  30. Machine Learning Language Tensors Reduplication 1-way and 2-way Finite-State Transducers Finite-state transducer Origin information 1-way a.i a.ii ( ⋊ : ⋊ ) (t:t) (a:a ∼ ta) ( ⋉ : ⋉ ) q f q 0 q 1 q 2 q 4 start p a t (a:a ∼ pa) (p:p) ( Σ : Σ ) p p q 3 a a t 2-way b.i b.ii p a t ( ⋊ : λ :+1) (C:C:+1) start q 0 q 1 q 2 ( Σ : Σ :+1) (V:V:-1) p p a a t q 3 q 4 q f ( Σ : Σ :-1) ( ⋊ : ∼ :+1) ( ⋉ : λ :+1) 23

  31. Machine Learning Language Tensors Reduplication Encoder-Decoder = 1-way or 2-way FST? 24

  32. Machine Learning Language Tensors Reduplication Encoder-Decoder = 1-way or 2-way FST? 24

  33. Machine Learning Language Tensors Reduplication Encoder-Decoder = 1-way or 2-way FST? 24

  34. Machine Learning Language Tensors Reduplication Encoder-Decoder = 1-way or 2-way FST? 24

  35. Machine Learning Language Tensors Reduplication Encoder-Decoder = 1-way or 2-way FST? 24

  36. Machine Learning Language Tensors Reduplication Main Points 1 Language is not just data you throw at a machine 2 Language is a fundamentally computational process uniquely learned by humans. 3 We can use core properties of language to understand how other systems learn. Want More? ◮ Mathematical Linguistics Reading Group ◮ Fridays, 12pm-1pm, SBS N250 ◮ Website: complab-stonybrook.github.io/mlrg/ ◮ IACS Machine Learning and Statistical Inference Working Group ◮ Every other week, contact me for details 25

Recommend


More recommend