Finite State Transducers fmies:N 1 2 3 4 5 time:N fmies:V example 2: POS tagging / shallow parsing like:ADP like:V an:D arrow:N 0 1 2 0 Where do we use FSTs? PROPN:NP In this lecture, we treat an FSA as a simple FST that outputs 5 6 a t Data structures and algorithms o g its input: edge label ‘a’ is a shorthand for ‘a:a’. Summary Ç. Çöltekin, SfS / University of Tübingen WS 19–20 4 / 17 Introduction Operations on FSTs Determinizing FSTs N:NP Note: (1) It is important to express the ambiguity. (2) This 3 0 0 1 2 a:b a b a b a:b 1 FST inversion 2 b:a a b a b b:a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 symbols Summary gets interesting if we can ‘compose’ these automata. Summary Ç. Çöltekin, SfS / University of Tübingen WS 19–20 5 / 17 Introduction Operations on FSTs Determinizing FSTs Closure properties of FSTs Determinizing FSTs Like FSA, FSTs are closed under some operations. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 6 / 17 Introduction Operations on FSTs 4 7 / 17 2 0 Summary Determinizing FSTs 1 Introduction 1 / 17 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b a:a b:b a:a a:b b:b 2 1 rational ) relation Ç. Çöltekin, symbol, while it outputs the corresponding output symbol transitions are conditioned on a pair of symbols A quick introduction Finite state transducers Summary Determinizing FSTs Operations on FSTs Introduction Winter Semester 2019–2020 Seminar für Sprachwissenschaft University of Tübingen ccoltekin@sfs.uni-tuebingen.de Çağrı Çöltekin for Computational Linguistics III Formal defjnition Operations on FSTs SfS / University of Tübingen Introduction Ç. Çöltekin, SfS / University of Tübingen WS 19–20 3 / 17 Uses in NLP/CL Where do we use FSTs? WS 19–20 Determinizing FSTs Operations on FSTs Summary 2 / 17 Summary Introduction Operations on FSTs Determinizing FSTs 0 example 1: morphological analysis Where do we use FSTs? • A fjnite state transducer (FST) is a fjnite state machine where • The machine moves between the states based on input • An FST encodes a relation , a mapping from a set to another • The relation defjned by an FST is called a regular (or babba → babbb aba → bbb aba → abb • Morphological analysis A fjnite state transducer is a tuple ( Σ i , Σ o , Q , q 0 , F , ∆ ) • Spelling correction Σ i is the input alphabet • Transliteration Σ o is the output alphabet • Speech recognition Q a fjnite set of states • Grapheme-to-phoneme mapping q 0 is the start state, q 0 ∈ Q • Normalization F is the set of accepting states, F ⊆ Q • Tokenization ∆ is a relation ( ∆ : Q × Σ i → Q × Σ o ) • POS tagging (not typical, but done) • partial parsing / chunking • … c s: ⟨ PL ⟩ ADJ: ϵ DET: ϵ d • Since FST encodes a relation, it can be reversed • Inverse of an FST swaps the input symbols with output • Concatenation • We indicate inverse of an FST M with M − 1 • Kleene star • Complement M • Reversal • Union • Intersection • Inversion M − 1 • Composition
Introduction 11 1 2 a:b a b a:b 0 1 b c a a c b:c 00 01 a:b 20 FST composition 9 / 17 0 FST composition Summary Determinizing FSTs Operations on FSTs Introduction WS 19–20 a SfS / University of Tübingen Ç. Çöltekin, b:c a b a:b 0 Summary 2 1 1 0 a:b a b a:b 2 0 a Operations on FSTs FST composition Summary Determinizing FSTs Operations on FSTs Introduction b c a Determinizing FSTs b Operations on FSTs Introduction 9 / 17 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b c b:c a 20 11 a:b 01 00 1 a:b WS 19–20 construction to FSTs a a a Ç. Çöltekin, SfS / University of Tübingen WS 19–20 10 / 17 Introduction Operations on FSTs Determinizing FSTs Summary FST determinization unambiguous transitions from every state on any input symbol converting to a subsequential FST 1 a WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b b a determinized a:b 3 2 1 0 Is this FST deterministic? 2 0 a b 00 a:b a 20 11 a:b 01 c b:c a a a b c 1 0 a:b b b:c a:b Projection a b a:b 2 1 0 input language or the output language Summary Ç. Çöltekin, Determinizing FSTs Operations on FSTs Introduction 9 / 17 WS 19–20 SfS / University of Tübingen 9 / 17 11 / 17 SfS / University of Tübingen 8 / 17 0 a:b a b a:b sequential application 2 1 0 FST composition Summary Determinizing FSTs Operations on FSTs Introduction WS 19–20 b c SfS / University of Tübingen Ç. Çöltekin, 0 2 a:b a b a:b 0 1 b c a a c b:c Ç. Çöltekin, 1 1 a 1 a a:b a b a:b a:b 0 b c 0 Determinizing FSTs a a c b:c 00 01 1 2 Summary Ç. Çöltekin, FST composition Summary Determinizing FSTs c b:c Operations on FSTs Introduction 9 / 17 00 FST composition WS 19–20 SfS / University of Tübingen M 1 M 1 M 2 M 2 M 1 ◦ M 2 − − − − − − − − − − − − − − − − → M 1 M 2 aa − − − bb − − − bb → → M 1 M 2 − − − − − − bb ∅ ∅ → → M 1 M 2 − − − − − − aaaa baab baac → → M 1 M 2 abaa − − − bbab − − − bbac → → • Can we compose without running the FSTs sequentially? M 1 M 1 M 2 M 2 M 1 M 1 M 2 M 2 c c : : a a M 1 ◦ M 2 • A deterministic FST has • Projection turns an FST into a FSA, accepting either the • We can extend the subset M project ( M ) • Determinization often means • However, not all FSTs can be
Introduction WS 19–20 b a Note that we cannot ‘determine’ the output on fjrst input, until reaching the fjnal input. Ç. Çöltekin, SfS / University of Tübingen 15 / 17 a Introduction Operations on FSTs Determinizing FSTs Summary FSA vs FST the inputs they accept – FSTs are not closed under intersection and complement a a – Determinizing FSTs is not always possible Determinizing FSTs Ç. Çöltekin, SfS / University of Tübingen Operations on FSTs 14 / 17 Introduction Operations on FSTs Summary a:b Determinizing FSTs Another example Can you convert the following FST to a subsequential FST? 0 1 2 3 – We can compose (and invert) the FSTs course) ab Mohri, Mehryar (2009). “Weighted automata algorithms”. In: Handbook of Weighted SfS / University of Tübingen WS 19–20 A.1 References References / additional reading material (cont.) Introduction to Natural Language Processing, Computational Linguistics, and Speech Automata . Monographs in Theoretical Computer Science. Springer, pp. 213–254. – Mohri (2009): weighted FSTs Roche, Emmanuel and Yves Schabes (1996). Introduction to Finite-State Devices in Natural Laboratories. url: http://www.merl.com/publications/docs/TR96-13.pdf . Roche, Emmanuel and Yves Schabes (1997). Finite-state Language Processing . A Bradford book. MIT Press. isbn: 9780262181822. Ç. Çöltekin, SfS / University of Tübingen WS 19–20 Ç. Çöltekin, FSTs and their use in NLP Ç. Çöltekin, Next SfS / University of Tübingen WS 19–20 16 / 17 Introduction Operations on FSTs Determinizing FSTs Summary – String search (FSA) – Roche and Schabes (1996) and Roche and Schabes (1997): – Finite-state morphology (FST) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 17 / 17 References References / additional reading material a b WS 19–20 bb a a:b 2 1 0 Subsequential FSTs Summary Determinizing FSTs Operations on FSTs Introduction 12 / 17 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b b a:b allow ambiguity Determinizing FSTs Summary Sequential FSTs transition from each state on every input symbol length of input 0 b 1 2 3 a:ab b:a a b A.2 a b a:a a:b b Determinizing FSTs Summary An exercise a 0 1 2 a b Introduction 0 1 2 3 4 5 aa ba Operations on FSTs Convert the following FST to a subsequential FST 13 / 17 Ç. Çöltekin, WS 19–20 SfS / University of Tübingen bb to two strings, e.g., • A k-subsequential FST is a sequential FST which can output up to k strings at an accepting state • A sequential FST has a single • Subsequential transducers allow limited ambiguity a: ϵ • Recognition time is still linear b: ϵ • Output symbols can be strings, as well as ϵ • The recognition is linear in the • However, sequential FSTs do not • The 2-subsequential FST above maps every string it accepts – baa → bba – baa → bbbb a: ϵ b: ϵ a: ϵ b • FSA are acceptors , FSTs are transducers • FSA accept or reject their input, FSTs produce output(s) for • Practical applications of fjnite-state machines • FSA defjne sets, FSTs defjne relations between sets • FSTs share many properties of FSAs. However, • Dependency grammars and dependency parsing • Constituency (context-free) parsing • Both FSA and FSTs can be weighted (not covered in this Jurafsky, Daniel and James H. Martin (2009). Speech and Language Processing: An • Jurafsky and Martin (2009, Ch. 3) Recognition . second. Pearson Prentice Hall. isbn: 978-0-13-504196-3. • Additional references include: Language Processing Technical Report . Tech. rep. TR96-13. Mitsubishi Electric Research
Recommend
More recommend