grammatical inference and subregular phonology
play

Grammatical inference and subregular phonology Adam Jardine - PowerPoint PPT Presentation

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 10, 2019 Tel Aviv University Review Outline of course Day 1: Learning, languages, and grammars Day 2: Learning strictly local grammars Day 3:


  1. Grammatical inference and subregular phonology Adam Jardine Rutgers University December 10, 2019 · Tel Aviv University

  2. Review

  3. Outline of course • Day 1: Learning, languages, and grammars • Day 2: Learning strictly local grammars • Day 3: Automata and input strictly local functions • Day 4: Learning functions and stochastic patterns, other open questions 2

  4. Review of day 1 • Phonological patterns are governed by restrictive computational universals • Grammatical inference connects these universals to solutions to the learning problem : Problem Given a positive sample of a language, return a grammar that describes that language exactly 3

  5. Review of day 1 • Strictly local languages are patterns computed solely by k -factors in a string w a b b a b ⋊ ⋉ a b b a b ⋉ ❢❛❝ 2 ( w ) = a b b a b ⋊ 4

  6. Today • A provably correct method for learning SL k languages • The paradigm of identification in the limit from positive data (Gold, 1967; de la Higuera, 2010) • Why learners target classes (not specific languages, or all possible languages) 5

  7. Learning paradigm

  8. Learning paradigm information Model of Model of Oracle Learner language language requests M O M L (from Heinz et al., 2016) Problem Given a positive sample of a language, return a grammar that describes that language exactly • This is (exact) identification in the limit from positive data ( ILPD; Gold, 1967) 6

  9. Identification in the limit from positive data (ILPD) text Model of Model of Oracle Learner language language G ⋆ G L ⋆ = L ( G ⋆ ) • A text of L ⋆ is some sample of positive examples of L ⋆ 7

  10. Identification in the limit from positive data (ILPD) A presentation of L ⋆ is a sequence p of examples drawn from L ⋆ t p ( t ) L ⋆ 0 abab 1 ababab 2 ab 3 λ 4 ab . . . . . . (this is the ‘in the limit’ part) 8

  11. Identification in the limit from positive data (ILPD) A learner A takes a finite sequence and outputs a grammar p ( t ) t 0 abab 1 ababab 2 ab p [ i ] G i A 3 λ 4 ab . . . . . . n ababab . . . . . . 9

  12. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } 10

  13. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } 10

  14. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab 10

  15. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 10

  16. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 ab 10

  17. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 10

  18. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 bab 10

  19. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 { ab, bab } bab 10

  20. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 { ab, bab } bab 3 { ab, bab, aaa } aaa 10

  21. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 { ab, bab } bab 3 { ab, bab, aaa } aaa 4 { ab, bab, aaa } ab 10

  22. Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 { bab } bab 1 { ab, bab } ab 2 { ab, bab } bab 3 { ab, bab, aaa } aaa 4 { ab, bab, aaa } ab ... 308 { ab, bab, aaa } bab 10

  23. Identification in the limit from positive data (ILPD) A converges at point n if G m = G n for any m > n p ( t ) t G t 0 abab G 0 1 ababab G 1 2 ab G 2 . . . . . . . . . convergence n ababab G n n + 1 abababab G n . . . . . . . . . m λ G n . . . . . . . . . 11

  24. Identification in the limit from positive data (ILPD) ILPD-learnability A class C is ILPD-learnable if there is some algorithm A C such that for any stringset L ∈ C , given any positive presentation p of L , A C converges to a grammar G such that L ( G ) = L . • How is ILPD learning an idealization? • What are the advantages of using ILPD as a criterion for learning? 12

  25. Learning strictly local languages

  26. Learning SL languages • Given any k , the class SL k is IDLP-learnable. • Using A Fin as an example, how might we learn a SL k language? 13

  27. Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14

  28. Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14

  29. Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C,V V , V ⋉ } CV CV 2 CV V CV CV 3 V CV CV . . . 14

  30. Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C,V V , V ⋉ } CV CV 2 CV V CV CV { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C, V V , V ⋉ } 3 V CV CV . . . 14

  31. Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C,V V , V ⋉ } CV CV 2 CV V CV CV { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C, V V , V ⋉ } 3 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C, V V , V ⋉ } V CV CV . . . 14

  32. Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } 14

  33. Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } • The characteristic sample is ... 14

  34. Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } • The characteristic sample is ❢❛❝ k ( L ⋆ ) 14

  35. Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } • The characteristic sample is ❢❛❝ k ( L ⋆ ) • The time complexity is linear —the time it takes to calculate is directly proportional to the size of the data sample. 14

  36. Learning SL languages Let’s learn Pintupi. Note that k = 3 . What is the initial hypothesis? At what point do we converge? datum hypothesis t 0 ´ σ 1 ´ σσ 2 ´ σσσ 3 σσ ´ ´ σσ 4 σσ ´ ´ σσσ 5 σσ ´ ´ σσ ´ σσ . . . 14

  37. The limits of SL learning

  38. The limits of SL learning • We must know k in advance L F in SL L ′ identical to L for some finite sequence p [ i ] • Gold (1967): any class C such that Fin � C is not learnable from positive examples 15

  39. The limits of SL learning • C onsider this pattern from Inseño Chumash: ‘I have a stroke of good luck’ S-api-tS h ol-it ‘he has a stroke of good luck’ s-api-ts h ol-us S-api-tS h ol-uS-waS ‘he had a stroke of good luck’ ‘his former Indian name’ ha-Sxintila-waS ‘they (two) show him’ s-is-tisi-jep-us ‘I darken it’ k-Su-Sojin • What phonotactic constraints are active here? 16

  40. The limits of SL learning • C onsider this pattern from Inseño Chumash: ‘I have a stroke of good luck’ S-api-tS h ol-it ‘he has a stroke of good luck’ s-api-ts h ol-us S-api-tS h ol-uS-waS ‘he had a stroke of good luck’ ‘his former Indian name’ ha-Sxintila-waS ‘they (two) show him’ s-is-tisi-jep-us ‘I darken it’ k-Su-Sojin • What phonotactic constraints are active here? * S...s , * s...S 16

  41. The limits of SL learning • Let’s assume L ⋆ = L C for Σ = { s,o,t,S } as given below L C = { so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ... } datum hypothesis t 0 { ss, so, sS, ..., Ss, St, SS } sos 1 { ss, so, sS, ..., Ss, St, SS } sotoss 2 { ss, so, sS, ..., Ss, St, SS } SoStoSS . . . 17

Recommend


More recommend