Grammatical inference and subregular phonology Adam Jardine Rutgers University December 11, 2019 · Tel Aviv University
Review
“[V]arious formal and substantive universals are intrinsic properties of the language-acquisition system, these providing a schema that is applied to data and that determines in a highly restricted way the general form and, in part, even the substantive features of the grammar that may emerge upon presentation of appropriate data.” Chomsky, Aspects “[I]f an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.” Wolpert and Macready (1997), NFL Thms. 2
computable languages SL Reg phonotactics computable functions Subseq ISL Reg processes 3
Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : ⊥ C : ⊥ C : ⊤ V : ⊤ C : ⊤ ⋊ : ⊥ V : ⊤ V : ⊥ V : ⊤ 4
Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : ⊥ data 0 CV 1 V C : ⊥ C : ⊤ V : ⊤ 2 CV CV C : ⊤ ⋊ : ⊥ V : ⊤ V : ⊥ V : ⊤ 4
Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : C C : λ C : C V : V C : C ⋊ : λ V : λ V : V V : V 5
Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : V C C : V C : C V : V C : C ⋊ : λ V : λ V : V V : V 5
Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : V C C : V C : C V : V C : C ⋊ : λ V : λ V : λ V : V 5
Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : � C : � C : � V : � C : � ⋊ : � V : � V : � V : � 5
Today • U sing automata structure for learning – ISL functions – SL distributions • Open questions 6
Learning ISL functions
Learning input strictly local functions • When learning languages, presentation is a sequence of examples of L datum t 0 V 1 CV CV 2 CV V CV CV . . . • When learning functions, ... 7
Learning input strictly local functions • When learning languages, presentation is a sequence of examples of L datum t 0 V 1 CV CV 2 CV V CV CV . . . • When learning functions, presentation is of example pairs from f datum t 0 ( C, CV ) 1 ( CV C, CV CV ) 2 ( CV CV, CV CV ) . . . 7
Learning input strictly local functions C : � datum t 0 ( C, CV ) C : � C : � V : � ? − − → 1 ( CV C, CV CV ) 2 ( CV CV, CV CV ) C : � ⋊ : � V : � V : � 3 ( V CV C, V CV CV ) V : � 8
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = 9
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = 9
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C 9
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) ( CV C, CV C ) ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = ... ( CV C, CV C ) ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = ... ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10
Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10
Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) 11
Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) d CV ( C ) = d p ( CV ) − 1 d ( CV C ) = ( CV ) − 1 CV C = C 11
Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) d CV ( C ) = d p ( CV ) − 1 d ( CV C ) = ( CV ) − 1 CV C = C d V CV ( V ) = d p ( V CV ) − 1 d ( V CV V ) = ( V CV ) − 1 V CV = λ 11
Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) d CV ( C ) = d p ( CV ) − 1 d ( CV C ) = ( CV ) − 1 CV C = C d V CV ( V ) = d p ( V CV ) − 1 d ( V CV V ) = ( V CV ) − 1 V CV = λ w ( u ) = lcp ( d w ( u Σ ∗ )) d p 11
C : � ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : � V : � ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � 12
C : � ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : � ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � d p λ ( C ) = C 12
C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : � ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � d p C ( C ) = C 12
C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � d p C ( V ) = V 12
C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : � V : � ( V, V ) V : � d p CV ( C ) = C 12
C : C ( CV C, CV C ) ( CV V , CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : � V : λ ( V, V ) V : � d p CV ( V ) = λ 12
C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : � V : λ ( V , V ) V : V d p λ ( V ) = V 12
C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : λ C : C V : V ( CCV CC, CCV CC ) ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : λ V : λ ( V , V ) V : V d p ( CV C ) − 1 d ( CV C ) = λ, d p ( V ) − 1 d ( V ) = λ 12
C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : λ C : C V : V ( CCV CC, CCV CC ) ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : λ V : λ ( V, V ) V : V 12
Learning input strictly local functions • As any two ISL k functions share the same structure, this method ILPD-learns the ISL k functions C : � C : � C : � V : � C : � ⋊ : � V : � V : � V : � • This method extends to any class of functions that shares such a structure (Jardine et al., 2014) 13
Learning input strictly local functions • A l earning algorithm for grammars that explicitly encode computational properties of phonological patterns • Learning for OSL (Chandlee et al., 2015) and tier-based OSL (Burness and McMullin, 2019) use a similar (yet distinct) method • Learning URs uses this same structural concept (Hua et al. in progress) • Learning for optional ISL processes uses the same basic idea (Heinz in progress) based on Beros and de la Higuera (2016) 14
Learning SL distributions
Learning strictly local distributions • Pr obability distributions can be described with the same structure . C : 0 . 2 C : 0 . 2 C : 0 . 6 V : 0 . 6 C : 0 . 4 ⋊ : 0 . 0 V : 0 . 5 V : 0 . 1 V : 0 . 4 15
Recommend
More recommend