Grammatical inference and subregular phonology Adam Jardine Rutgers University December 10, 2019 · Tel Aviv University
Review
Outline of course • Day 1: Learning, languages, and grammars • Day 2: Learning strictly local grammars • Day 3: Automata and input strictly local functions • Day 4: Learning functions and stochastic patterns, other open questions 2
Review of day 1 • Phonological patterns are governed by restrictive computational universals • Grammatical inference connects these universals to solutions to the learning problem : Problem Given a positive sample of a language, return a grammar that describes that language exactly 3
Review of day 1 • Strictly local languages are patterns computed solely by k -factors in a string w a b b a b ⋊ ⋉ a b b a b ⋉ ❢❛❝ 2 ( w ) = a b b a b ⋊ 4
Today • A provably correct method for learning SL k languages • The paradigm of identification in the limit from positive data (Gold, 1967; de la Higuera, 2010) • Why learners target classes (not specific languages, or all possible languages) 5
Learning paradigm
Learning paradigm information Model of Model of Oracle Learner language language requests M O M L (from Heinz et al., 2016) Problem Given a positive sample of a language, return a grammar that describes that language exactly • This is (exact) identification in the limit from positive data ( ILPD; Gold, 1967) 6
Identification in the limit from positive data (ILPD) text Model of Model of Oracle Learner language language G ⋆ G L ⋆ = L ( G ⋆ ) • A text of L ⋆ is some sample of positive examples of L ⋆ 7
Identification in the limit from positive data (ILPD) A presentation of L ⋆ is a sequence p of examples drawn from L ⋆ t p ( t ) L ⋆ 0 abab 1 ababab 2 ab 3 λ 4 ab . . . . . . (this is the ‘in the limit’ part) 8
Identification in the limit from positive data (ILPD) A learner A takes a finite sequence and outputs a grammar p ( t ) t 0 abab 1 ababab 2 ab p [ i ] G i A 3 λ 4 ab . . . . . . n ababab . . . . . . 9
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 ab 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 bab 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 { ab, bab } bab 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 { ab, bab } bab 3 { ab, bab, aaa } aaa 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 bab { bab } 1 { ab, bab } ab 2 { ab, bab } bab 3 { ab, bab, aaa } aaa 4 { ab, bab, aaa } ab 10
Identification in the limit from positive data (ILPD) Let’s take the learner A Fin : A Fin ( p [ n ]) = { w | w = p ( i ) for some i ≤ n } Let’s set L ⋆ = { ab, bab, aaa } p ( t ) t G t 0 { bab } bab 1 { ab, bab } ab 2 { ab, bab } bab 3 { ab, bab, aaa } aaa 4 { ab, bab, aaa } ab ... 308 { ab, bab, aaa } bab 10
Identification in the limit from positive data (ILPD) A converges at point n if G m = G n for any m > n p ( t ) t G t 0 abab G 0 1 ababab G 1 2 ab G 2 . . . . . . . . . convergence n ababab G n n + 1 abababab G n . . . . . . . . . m λ G n . . . . . . . . . 11
Identification in the limit from positive data (ILPD) ILPD-learnability A class C is ILPD-learnable if there is some algorithm A C such that for any stringset L ∈ C , given any positive presentation p of L , A C converges to a grammar G such that L ( G ) = L . • How is ILPD learning an idealization? • What are the advantages of using ILPD as a criterion for learning? 12
Learning strictly local languages
Learning SL languages • Given any k , the class SL k is IDLP-learnable. • Using A Fin as an example, how might we learn a SL k language? 13
Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14
Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 CV CV 2 CV V CV CV 3 V CV CV . . . 14
Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C,V V , V ⋉ } CV CV 2 CV V CV CV 3 V CV CV . . . 14
Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C,V V , V ⋉ } CV CV 2 CV V CV CV { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C, V V , V ⋉ } 3 V CV CV . . . 14
Learning SL languages G ⋆ = { CC, C ⋉ } datum hypothesis t 0 { ⋊ C, ⋊ V,CC, CV, C ⋉ , V C, V V , V ⋉ } V 1 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C,V V , V ⋉ } CV CV 2 CV V CV CV { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C, V V , V ⋉ } 3 { ⋊ C, ⋊ V,CC, CV,C ⋉ , V C, V V , V ⋉ } V CV CV . . . 14
Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } 14
Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } • The characteristic sample is ... 14
Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } • The characteristic sample is ❢❛❝ k ( L ⋆ ) 14
Learning SL languages A SL k ( p [ i ]) = ❢❛❝ k (Σ ∗ ) − ❢❛❝ k { p (0) , p (1) , ..., p ( i ) } • The characteristic sample is ❢❛❝ k ( L ⋆ ) • The time complexity is linear —the time it takes to calculate is directly proportional to the size of the data sample. 14
Learning SL languages Let’s learn Pintupi. Note that k = 3 . What is the initial hypothesis? At what point do we converge? datum hypothesis t 0 ´ σ 1 ´ σσ 2 ´ σσσ 3 σσ ´ ´ σσ 4 σσ ´ ´ σσσ 5 σσ ´ ´ σσ ´ σσ . . . 14
The limits of SL learning
The limits of SL learning • We must know k in advance L F in SL L ′ identical to L for some finite sequence p [ i ] • Gold (1967): any class C such that Fin � C is not learnable from positive examples 15
The limits of SL learning • C onsider this pattern from Inseño Chumash: ‘I have a stroke of good luck’ S-api-tS h ol-it ‘he has a stroke of good luck’ s-api-ts h ol-us S-api-tS h ol-uS-waS ‘he had a stroke of good luck’ ‘his former Indian name’ ha-Sxintila-waS ‘they (two) show him’ s-is-tisi-jep-us ‘I darken it’ k-Su-Sojin • What phonotactic constraints are active here? 16
The limits of SL learning • C onsider this pattern from Inseño Chumash: ‘I have a stroke of good luck’ S-api-tS h ol-it ‘he has a stroke of good luck’ s-api-ts h ol-us S-api-tS h ol-uS-waS ‘he had a stroke of good luck’ ‘his former Indian name’ ha-Sxintila-waS ‘they (two) show him’ s-is-tisi-jep-us ‘I darken it’ k-Su-Sojin • What phonotactic constraints are active here? * S...s , * s...S 16
The limits of SL learning • Let’s assume L ⋆ = L C for Σ = { s,o,t,S } as given below L C = { so, ss, ..., sos, SoS, SoSoS, sosos, SototoS, sototos, ... } datum hypothesis t 0 { ss, so, sS, ..., Ss, St, SS } sos 1 { ss, so, sS, ..., Ss, St, SS } sotoss 2 { ss, so, sS, ..., Ss, St, SS } SoStoSS . . . 17
Recommend
More recommend