grammatical inference and subregular phonology
play

Grammatical inference and subregular phonology Adam Jardine - PowerPoint PPT Presentation

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 11, 2019 Tel Aviv University Review [V]arious formal and substantive universals are intrinsic properties of the language-acquisition system, these


  1. Grammatical inference and subregular phonology Adam Jardine Rutgers University December 11, 2019 · Tel Aviv University

  2. Review

  3. “[V]arious formal and substantive universals are intrinsic properties of the language-acquisition system, these providing a schema that is applied to data and that determines in a highly restricted way the general form and, in part, even the substantive features of the grammar that may emerge upon presentation of appropriate data.” Chomsky, Aspects “[I]f an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.” Wolpert and Macready (1997), NFL Thms. 2

  4. computable languages SL Reg phonotactics computable functions Subseq ISL Reg processes 3

  5. Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : ⊥ C : ⊥ C : ⊤ V : ⊤ C : ⊤ ⋊ : ⊥ V : ⊤ V : ⊥ V : ⊤ 4

  6. Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : ⊥ data 0 CV 1 V C : ⊥ C : ⊤ V : ⊤ 2 CV CV C : ⊤ ⋊ : ⊥ V : ⊤ V : ⊥ V : ⊤ 4

  7. Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : C C : λ C : C V : V C : C ⋊ : λ V : λ V : V V : V 5

  8. Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : V C C : V C : C V : V C : C ⋊ : λ V : λ V : V V : V 5

  9. Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : V C C : V C : C V : V C : C ⋊ : λ V : λ V : λ V : V 5

  10. Review • C omputational characterizations of phonological patterns identify structure that can be used by a learner C : � C : � C : � V : � C : � ⋊ : � V : � V : � V : � 5

  11. Today • U sing automata structure for learning – ISL functions – SL distributions • Open questions 6

  12. Learning ISL functions

  13. Learning input strictly local functions • When learning languages, presentation is a sequence of examples of L datum t 0 V 1 CV CV 2 CV V CV CV . . . • When learning functions, ... 7

  14. Learning input strictly local functions • When learning languages, presentation is a sequence of examples of L datum t 0 V 1 CV CV 2 CV V CV CV . . . • When learning functions, presentation is of example pairs from f datum t 0 ( C, CV ) 1 ( CV C, CV CV ) 2 ( CV CV, CV CV ) . . . 7

  15. Learning input strictly local functions C : � datum t 0 ( C, CV ) C : � C : � V : � ? − − → 1 ( CV C, CV CV ) 2 ( CV CV, CV CV ) C : � ⋊ : � V : � V : � 3 ( V CV C, V CV CV ) V : � 8

  16. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = 9

  17. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = 9

  18. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C 9

  19. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) ( CV C, CV C ) ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10

  20. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = ... ( CV C, CV C ) ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10

  21. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = ... ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10

  22. Learning input strictly local functions • The longest common prefix (lcp) is the longest initial sequence shared by a set of strings lcp ( { CV CV, CV CCV, CV CV C } ) = CV C lcp ( { CV CV, CCV CV, CV CV C } ) = C • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) ( V CV V, V CV ) 10

  23. Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) 11

  24. Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) d CV ( C ) = d p ( CV ) − 1 d ( CV C ) = ( CV ) − 1 CV C = C 11

  25. Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) d CV ( C ) = d p ( CV ) − 1 d ( CV C ) = ( CV ) − 1 CV C = C d V CV ( V ) = d p ( V CV ) − 1 d ( V CV V ) = ( V CV ) − 1 V CV = λ 11

  26. Learning input strictly local functions • Call our data sequence d ⊂ f d p ( w ) = lcp ( d ( w Σ ∗ )) ( CV, CV ) d p ( CV C ) = CV C ( CV C, CV C ) d p ( V CV V ) = V CV ( CV CV C, CV CV C ) ( V CV V C, V CV C ) d w ( u ) = d p ( w ) − 1 d ( wu ) ( V CV V, V CV ) d CV ( C ) = d p ( CV ) − 1 d ( CV C ) = ( CV ) − 1 CV C = C d V CV ( V ) = d p ( V CV ) − 1 d ( V CV V ) = ( V CV ) − 1 V CV = λ w ( u ) = lcp ( d w ( u Σ ∗ )) d p 11

  27. C : � ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : � V : � ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � 12

  28. C : � ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : � ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � d p λ ( C ) = C 12

  29. C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : � ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � d p C ( C ) = C 12

  30. C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : � ⋊ : � V : � V : � ( V, V ) V : � d p C ( V ) = V 12

  31. C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : � V : � ( V, V ) V : � d p CV ( C ) = C 12

  32. C : C ( CV C, CV C ) ( CV V , CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : � V : λ ( V, V ) V : � d p CV ( V ) = λ 12

  33. C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : � ( CCV CC, CCV CC ) C : C V : V ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : � V : λ ( V , V ) V : V d p λ ( V ) = V 12

  34. C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : λ C : C V : V ( CCV CC, CCV CC ) ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : λ V : λ ( V , V ) V : V d p ( CV C ) − 1 d ( CV C ) = λ, d p ( V ) − 1 d ( V ) = λ 12

  35. C : C ( CV C, CV C ) ( CV V, CV ) ( CV CCV, CV CCV ) C : λ C : C V : V ( CCV CC, CCV CC ) ( CCCV CV, CCCV CV ) ( CV V CV, CV CV ) C : C ⋊ : � V : λ V : λ ( V, V ) V : V 12

  36. Learning input strictly local functions • As any two ISL k functions share the same structure, this method ILPD-learns the ISL k functions C : � C : � C : � V : � C : � ⋊ : � V : � V : � V : � • This method extends to any class of functions that shares such a structure (Jardine et al., 2014) 13

  37. Learning input strictly local functions • A l earning algorithm for grammars that explicitly encode computational properties of phonological patterns • Learning for OSL (Chandlee et al., 2015) and tier-based OSL (Burness and McMullin, 2019) use a similar (yet distinct) method • Learning URs uses this same structural concept (Hua et al. in progress) • Learning for optional ISL processes uses the same basic idea (Heinz in progress) based on Beros and de la Higuera (2016) 14

  38. Learning SL distributions

  39. Learning strictly local distributions • Pr obability distributions can be described with the same structure . C : 0 . 2 C : 0 . 2 C : 0 . 6 V : 0 . 6 C : 0 . 4 ⋊ : 0 . 0 V : 0 . 5 V : 0 . 1 V : 0 . 4 15

Recommend


More recommend