Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne Department of Engineering Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Informal introduction to finite state transducers • FSTs are graph structures with start state and final state • Arcs are annotated with: • An input symbol • An output symbol • A weight • The FST transduces an input string 𝑡 1 to an output string 𝑡 2 iff. there is a path from the start to the final state with: • 𝑡 1 is the concatenation of all input symbols • 𝑡 2 is the concatenation of all output symbols • The cost of this mapping is the (minimal) sum of arc weights Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Example FSTs • Maps 𝑡 1 = 𝑏𝑐𝑑 to itself Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Example FSTs • Maps 𝑡 1 = 𝑏𝑐𝑑 to itself Output symbol Final state Start state Input symbol Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Example FSTs • Maps 𝑡 1 = 𝑏𝑐𝑑 to itself Input and output Final state Start state symbol Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Example FSTs • Maps 𝑡 1 = 𝑏𝑐𝑑 to itself • Maps any string consisting only of 𝑏 symbols to itself Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Example FSTs 𝜗 : empty input/output symbol • Represents an 𝑜 -best list Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Example FSTs 𝜗 : empty input/output symbol • Represents an 𝑜 -best list • After determinization, 𝜗 -removal, minimization, weight pushing Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST composition • Composition: Combines two FSTs 𝑈 1 and 𝑈 2 to a new FST 𝑈 1 ∘ 𝑈 2 • If 𝑈 1 maps 𝑡 1 to 𝑡 2 and 𝑈 2 maps 𝑡 2 to 𝑡 3 , then 𝑈 1 ∘ 𝑈 2 maps 𝑡 1 to 𝑡 3 . • The cost is the (minimum) sum of the path costs in 𝑈 1 and 𝑈 2 . Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST composition examples • Composition and weights 𝑈 𝑈 2 𝑈 1 ∘ 𝑈 2 1 Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST composition examples • Counting transducers 𝑈 1 𝑈 2 𝑈 1 ∘ 𝑈 2 Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST composition examples • Language models 𝑈 1 𝑈 2 𝑈 1 ∘ 𝑈 2 Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST composition examples • 1:1 corrections 𝑈 1 𝑈 2 𝑈 1 ∘ 𝑈 2 Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST-based unsupervised grammatical error correction 𝐽 (Input) 𝑄 (Penalization) 𝐹 (Edit) 𝑀 (5-gram LM) … Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST-based unsupervised grammatical error correction • 𝐽 : Input • 𝐹 : Edit • 𝑄 : Penalization 𝐽 ∘ 𝐹 • 𝑀 : 5-gram LM 𝐽 ∘ 𝐹 ∘ 𝑄 𝐽 ∘ 𝐹 ∘ 𝑄 ∘ 𝑀 : Non-neural unsupervised GEC with 5-gram LM scores Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST-based neural unsupervised GEC • 𝐽 : Input • • Idea: Use the constructed FSTs to constrain the output 𝐹 : Edit • 𝑄 : Penalization of a neural LM • 𝑀 : 5-gram LM • 𝑈 : Tokenization • Neural sequence models normally use subwords or (word → BPE) characters rather than words. • Build transducer 𝑈 that maps full words to subwords (byte-pair encoding, BPE) • Constrain neural LM with 𝐽 ∘ 𝐹 ∘ 𝑄 ∘ 𝑀 ∘ 𝑈 • For constrained neural decoding we use our SGNMT decoder http://ucam-smt.github.io/sgnmt/html/ Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Results (unsupervised) Systems are tuned with respect to metric highlighted in grey. Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST-based neural supervised GEC • If annotated training data is available: • Input 𝐽 is a (Moses) SMT lattice rather than a single sentence • In addition to the <corr> token, we use an <mcorr> token to count the edits by the SMT system. • We use an ensemble of a neural language model and a neural machine translation model. Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
FST-based supervised grammatical error correction 𝐽 (Input SMT lattice) • 𝐽 : Input • 𝐹 : Edit • 𝑄 : Penalization • 𝑀 : 5-gram LM • 𝑈 : Tokenization (word → BPE) 𝐽 ∘ 𝐹 𝐽 ∘ 𝐹 ∘ 𝑄 ∘ 𝑀 ∘ 𝑈 : Constraint for neural ensembles Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Results (supervised) Systems are tuned with respect to metric highlighted in grey. Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Results (supervised) Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Thanks Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
BACKUP Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher Bryant, and Bill Byrne
Recommend
More recommend