finite state transducers
play

Finite State Transducers Data structures and algorithms for - PowerPoint PPT Presentation

Finite State Transducers Data structures and algorithms for Computational Linguistics III ar ltekin ccoltekin@sfs.uni-tuebingen.de University of Tbingen Seminar fr Sprachwissenschaft Winter Semester 20192020 Introduction


  1. Finite State Transducers Data structures and algorithms for Computational Linguistics III Çağrı Çöltekin ccoltekin@sfs.uni-tuebingen.de University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020

  2. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b a:a b:b a:a a:b b:b 2 1 0 outputs the corresponding output symbol conditioned on a pair of symbols A quick introduction Finite state transducers Summary Determinizing FSTs 1 / 17 • A fjnite state transducer (FST) is a fjnite state machine where transitions are • The machine moves between the states based on input symbol, while it • An FST encodes a relation , a mapping from a set to another • The relation defjned by an FST is called a regular (or rational ) relation babba → babbb aba → bbb aba → abb

  3. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b a:a b:b a:a a:b b:b 2 1 0 outputs the corresponding output symbol conditioned on a pair of symbols A quick introduction Finite state transducers Summary Determinizing FSTs 1 / 17 • A fjnite state transducer (FST) is a fjnite state machine where transitions are • The machine moves between the states based on input symbol, while it • An FST encodes a relation , a mapping from a set to another • The relation defjned by an FST is called a regular (or rational ) relation babba → babbb aba → bbb aba → abb

  4. Introduction Operations on FSTs Determinizing FSTs Summary Formal defjnition Ç. Çöltekin, SfS / University of Tübingen WS 19–20 2 / 17 A fjnite state transducer is a tuple ( Σ i , Σ o , Q , q 0 , F , ∆ ) Σ i is the input alphabet Σ o is the output alphabet Q a fjnite set of states q 0 is the start state, q 0 ∈ Q F is the set of accepting states, F ⊆ Q ∆ is a relation ( ∆ : Q × Σ i → Q × Σ o )

  5. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, 3 / 17 Where do we use FSTs? Uses in NLP/CL Summary Determinizing FSTs • Morphological analysis • Spelling correction • Transliteration • Speech recognition • Grapheme-to-phoneme mapping • Normalization • Tokenization • POS tagging (not typical, but done) • partial parsing / chunking • …

  6. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, label ‘a’ is a shorthand for ‘a:a’. In this lecture, we treat an FSA as a simple FST that outputs its input: edge g o t a 4 / 17 6 5 4 3 2 1 0 example 1: morphological analysis Where do we use FSTs? Summary Determinizing FSTs c s: ⟨ PL ⟩ d

  7. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, we can ‘compose’ these automata. Note: (1) It is important to express the ambiguity. (2) This gets interesting if PROPN:NP N:NP 2 1 0 arrow:N an:D like:V like:ADP 2 Determinizing FSTs Summary Where do we use FSTs? example 2: POS tagging / shallow parsing 0 1 5 / 17 fmies:V 3 4 5 time:N fmies:N ADJ: ϵ DET: ϵ

  8. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, 6 / 17 Closure properties of FSTs Like FSA, FSTs are closed under some operations. Summary Determinizing FSTs • Concatenation • Kleene star • Complement • Reversal • Union • Intersection • Inversion • Composition

  9. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b:a a b b:a a b 2 1 0 a:b a b a:b a b 2 1 0 FST inversion Summary Determinizing FSTs 7 / 17 • Since FST encodes a relation, it can be reversed • Inverse of an FST swaps the input symbols with output symbols • We indicate inverse of an FST M with M − 1 M M − 1

  10. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, Can we compose without running the FSTs sequentially? Operations on FSTs c b:c a a b c 1 8 / 17 a:b a b a:b 2 1 0 Determinizing FSTs Summary FST composition sequential application M 1 M 2 M 1 − − − aa → M 1 − − − bb → M 1 aaaa − − − → M 1 − − − abaa →

  11. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, Can we compose without running the FSTs sequentially? Operations on FSTs c b:c a a b c 1 8 / 17 sequential application a:b a b a:b 2 1 0 FST composition Summary Determinizing FSTs M 1 M 2 M 1 M 2 − − − − − − aa bb → → M 1 M 2 − − − − − − bb ∅ → → M 1 M 2 aaaa − − − baab − − − → → M 1 M 2 − − − − − − abaa bbab → →

  12. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, Operations on FSTs c b:c a a b c 1 8 / 17 a:b 0 FST composition Summary Determinizing FSTs sequential application 1 2 a b a:b M 1 M 2 M 1 ◦ M 2 − − − − − − − − − − − − − − − − → M 1 M 2 − − − − − − aa bb bb → → M 1 M 2 − − − − − − bb ∅ ∅ → → M 1 M 2 aaaa − − − baab − − − baac → → M 1 M 2 − − − − − − abaa bbab bbac → → • Can we compose without running the FSTs sequentially?

  13. Introduction a:b WS 19–20 SfS / University of Tübingen Ç. Çöltekin, c b:c a a b c 1 0 a b Operations on FSTs a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  14. Introduction 0 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, 00 c b:c a a b c 1 a:b Operations on FSTs a b a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  15. Introduction 1 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a:b 01 00 c b:c a a b c 0 Operations on FSTs a:b a b a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  16. Introduction a WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b a:b a 20 11 a:b 01 00 c b:c a b c Operations on FSTs 1 0 a:b a b a:b 2 1 0 FST composition Summary Determinizing FSTs 9 / 17 M 1 M 2

  17. Introduction a WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b:c a b a:b a 20 11 a:b 01 00 Operations on FSTs c b:c a 1 Determinizing FSTs Summary FST composition b c 0 2 a:b a b a:b 0 1 9 / 17 M 1 M 2 c : a

  18. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b:c a b a:b a 20 11 a:b 01 00 c b:c a a 1 Determinizing FSTs Summary FST composition b c 0 9 / 17 2 a:b a b a:b 0 1 M 1 M 2 c : a M 1 ◦ M 2

  19. Introduction a:b WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a a a 2 1 0 a b Operations on FSTs a:b 2 1 0 output language Projection Summary Determinizing FSTs 10 / 17 • Projection turns an FST into a FSA, accepting either the input language or the M project ( M )

  20. Introduction 1 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b b a a a:b 3 2 0 Operations on FSTs Is this FST deterministic? subsequential FST FSTs symbol transitions from every state on any input FST determinization Summary Determinizing FSTs 11 / 17 • A deterministic FST has unambiguous • We can extend the subset construction to • Determinization often means converting to a • However, not all FSTs can be determinized

  21. Introduction 1 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b b a a a:b 3 2 0 Operations on FSTs Is this FST deterministic? subsequential FST FSTs symbol transitions from every state on any input FST determinization Summary Determinizing FSTs 11 / 17 • A deterministic FST has unambiguous • We can extend the subset construction to • Determinization often means converting to a • However, not all FSTs can be determinized

  22. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, b a b b a b:a a:ab 3 2 1 0 ambiguity input each state on every input symbol Sequential FSTs Summary Determinizing FSTs 12 / 17 a: ϵ • A sequential FST has a single transition from • Output symbols can be strings, as well as ϵ • The recognition is linear in the length of • However, sequential FSTs do not allow

  23. Introduction Operations on FSTs WS 19–20 SfS / University of Tübingen Ç. Çöltekin, e.g., bb a a b a:b b a:b 2 1 0 accepting state Subsequential FSTs Summary Determinizing FSTs 13 / 17 • A k-subsequential FST is a sequential FST which can output up to k strings at an • Subsequential transducers allow limited ambiguity • Recognition time is still linear b: ϵ • The 2-subsequential FST above maps every string it accepts to two strings, – baa → bba – baa → bbbb

  24. a: b: a: b Introduction 5 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b ab bb ba aa 4 Operations on FSTs 3 2 1 0 a b a:a a:b b 2 1 0 Convert the following FST to a subsequential FST An exercise Summary Determinizing FSTs 14 / 17

  25. Introduction 3 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, a b ab bb ba aa 5 Operations on FSTs 4 2 1 Determinizing FSTs Summary An exercise Convert the following FST to a subsequential FST 14 / 17 0 1 2 a:a a:b b a b 0 a: ϵ a: ϵ b: ϵ b

Recommend


More recommend