Introduction Languages Functions Conclusion Subregular toolkit implemented in Python Al¨ ena Aks¨ enova Stony Brook University IACS Jr. Researcher Award Presentation IACS @ SBU August 16, 2018
Introduction Languages Functions Conclusion Subregular toolkit: general information kist : k ist i mplementing s ubregular t oolkit Motivation: to collect in one place the functionality for subregular languages and subsequential transducers. For researchers: to avoid manual burden of extracting grammars and designing transducers, creating data samples, or scanning strings; For practitioners: to start using tools in practice that are currently available only in the literature. Python 3 (will be available via pip ) Open source Available on GitHub # https://github.com/loisetoil/slp 1
Introduction Languages Functions Conclusion Subregular toolkit: general information kist : k ist i mplementing s ubregular t oolkit Motivation: to collect in one place the functionality for subregular languages and subsequential transducers. For researchers: to avoid manual burden of extracting grammars and designing transducers, creating data samples, or scanning strings; For practitioners: to start using tools in practice that are currently available only in the literature. Python 3 (will be available via pip ) Open source Available on GitHub # https://github.com/loisetoil/slp 1
Introduction Languages Functions Conclusion Subregular toolkit: general information kist : k ist i mplementing s ubregular t oolkit Motivation: to collect in one place the functionality for subregular languages and subsequential transducers. For researchers: to avoid manual burden of extracting grammars and designing transducers, creating data samples, or scanning strings; For practitioners: to start using tools in practice that are currently available only in the literature. Python 3 (will be available via pip ) Open source Available on GitHub # https://github.com/loisetoil/slp 1
Introduction Languages Functions Conclusion More motivations This subregular toolkit allows one to: use recent theoretical results in practice; test ideas currently available in the literature; explore new methods to model natural language; automatically extract dependencies therefore avoiding manual burden of automata/transducer construction. Natural language processing ● Subreg- ular toolkit ● ● Theoretical Formal language linguistics theory 2
Introduction Languages Functions Conclusion More motivations This subregular toolkit allows one to: use recent theoretical results in practice; test ideas currently available in the literature; explore new methods to model natural language; automatically extract dependencies therefore avoiding manual burden of automata/transducer construction. Natural language processing ● Subreg- ular toolkit ● ● Theoretical Formal language linguistics theory 2
Introduction Languages Functions Conclusion The importance of formalization In order to abstract away from details and look at the big picture, we need to formalize: Languages → sets of strings of a particular type; Functions → descriptions of processes. kist toolkit provides functionality that allows one to work with (sub)regular languages and functions. Such a toolkit is useful for NLP, and not only. 3
Introduction Languages Functions Conclusion Languages vs. Functions observed process data language function FST FSA generating device Here, I only work with (sub)regular – requiring a finite amount of memory – languages and functions. 4
Introduction Languages Functions Conclusion What is done and what is left Last year: ✓ FSA implementation: ◻ ✓ architecture; ◻ ✓ optimization. ◻ ✓ Languages (SL, TSL, SP): ◻ ✓ learners; ◻ ✓ scanners; ◻ ✓ sample generators; ◻ ✓ neg ↔ pos switch; ◻ ✓ corresponding FSA. ◻ d 5
Introduction Languages Functions Conclusion What is done and what is left Last year: This year: ✓ FSA implementation: ◻ ◻ Languages (MTSL, SS-TSL): ✓ architecture; ◻ ◻ learners; ✓ optimization. ◻ scanners; ◻ ◻ sample generators; ✓ Languages (SL, TSL, SP): ◻ ◻ neg ↔ pos switch; ✓ learners; ◻ ◻ corresponding FSA. ✓ scanners; ◻ ✓ sample generators; ◻ Transduction learners: ◻ ✓ neg ↔ pos switch; ◻ ◻ OSTIA; ✓ corresponding FSA. ◻ ◻ ISLFLA; d ◻ OSLFIA. 5
Introduction Languages Functions Conclusion Languages and FSMs The class of regular languages REG consists of smaller sub-classes. (McNaughton&Papert 1971) For every (sub)regular language, SS-TSL MTSL it is possible to construct a corresponding finite state automaton. TSL Most subregular classes are learnable in polynomial time with positive data only. SL SP There is a variety of applications Subregular hierarchy for subregular languages! (simplified) 6
Introduction Languages Functions Conclusion Languages and FSMs The class of regular languages REG consists of smaller sub-classes. (McNaughton&Papert 1971) For every (sub)regular language, SS-TSL MTSL it is possible to construct a corresponding finite state automaton. TSL Most subregular classes are learnable in polynomial time with positive data only. SL SP There is a variety of applications Subregular hierarchy for subregular languages! (simplified) 6
Introduction Languages Functions Conclusion What are the applications? Applications Linguistics Experiments with NN Robotics (Rawal et. al 2011) (Avcu et. al 2017) Meaning Sounds Words Sentences (Heinz 2010) (Aks¨ enova et. al 2016) (Graf&Heinz 2015) (Graf 2017) 7
Introduction Languages Functions Conclusion Subregular languages in KIST Implemented functionality: REG learners; scanners; SS-TSL MTSL sample generators; negative ↔ positive grammar translators; TSL constructing corresponding FSA; trimming FSA. SL SP 8
Introduction Languages Functions Conclusion Subregular languages in KIST Implemented functionality: REG REG learners; scanners; SS-TSL SS-TSL MTSL MTSL sample generators; negative ↔ positive grammar translators; TSL TSL constructing corresponding FSA; trimming FSA. SL SL SP SP 8
Introduction Languages Functions Conclusion Language example Language: Bukusu (Kenya) Construction: V + el/er/il/ir ‘use something to V’ Rule: “match the sounds of the suffix with the sounds of the verb” tleex-el ‘use smth to cook’ reeb-er ‘use smth to ask’ lim-il ‘use smth to cultivate’ ir-ir ‘use smth to die’ 9
Introduction Languages Functions Conclusion Language example Language: Bukusu (Kenya) Construction: V + el/er/il/ir ‘use something to V’ Rule: “match the sounds of the suffix with the sounds of the verb” tleex-el ‘use smth to cook’ reeb-er ‘use smth to ask’ lim-il ‘use smth to cultivate’ ir-ir ‘use smth to die’ 9
Introduction Languages Functions Conclusion Language example [cont.] Simple formal version of the pattern: Language: Bukusu (Kenya) (l,e) + ∪ (l,i) + ∪ (r,e) + ∪ (r,i) + ok llliiillliiii ¬ liiirriii Construction: V + el/er/il/ir ok eeerreer ‘use something to V’ ¬ leeelliii ok lleeelle ... ok lll Rule: “match the sounds ok riiriirrr ... ok lll of the suffix with the sounds of the verb” Intuition is that [e] and [i] tleex-el ‘use smth to cook’ need to agree with each other, as well as [l] and [r] . reeb-er ‘use smth to ask’ Among themselves, these two lim-il ‘use smth to cultivate’ agreements do not interact. ir-ir ‘use smth to die’ 10
Introduction Languages Functions Conclusion Language example [cont.] Simple formal version of the pattern: Language: Bukusu (Kenya) (l,e) + ∪ (l,i) + ∪ (r,e) + ∪ (r,i) + ok llliiillliiii ¬ liiirriii Construction: V + el/er/il/ir ok eeerreer ‘use something to V’ ¬ leeelliii ok lleeelle ... ok lll Rule: “match the sounds ok riiriirrr ... ok lll of the suffix with the sounds of the verb” Intuition is that [e] and [i] tleex-el ‘use smth to cook’ need to agree with each other, as well as [l] and [r] . reeb-er ‘use smth to ask’ Among themselves, these two lim-il ‘use smth to cultivate’ agreements do not interact. ir-ir ‘use smth to die’ 10
Introduction Languages Functions Conclusion Language example [cont.] Simple formal version of the pattern: Language: Bukusu (Kenya) (l,e) + ∪ (l,i) + ∪ (r,e) + ∪ (r,i) + ok llliiillliiii ¬ liiirriii Construction: V + el/er/il/ir ok eeerreer ‘use something to V’ ¬ leeelliii ok lleeelle ... ok lll Rule: “match the sounds ok riiriirrr ... ok lll of the suffix with the sounds of the verb” Intuition is that [e] and [i] tleex-el ‘use smth to cook’ need to agree with each other, as well as [l] and [r] . reeb-er ‘use smth to ask’ Among themselves, these two lim-il ‘use smth to cultivate’ agreements do not interact. ir-ir ‘use smth to die’ 10
Introduction Languages Functions Conclusion Language example [cont.] (l,e) + ∪ (l,i) + ∪ (r,e) + ∪ (r,i) + Complexity: MTSL (multiple tier-based strictly local) Meaning: there are several sets of items involved in long-distance dependency. T 1 = { l , r } , and G 1 pos = ⟨ ll , rr ⟩ T 2 = { e , i } , and G 2 pos = ⟨ ee , ii ⟩ r r r r l < r , l > < r , l > ok ¬ r r e e e r e e r e e e e l < e , i > < e , i > e e e e e e e e e 11
Recommend
More recommend