Appeared in Proceedings of the Joint Meeting of the Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003) , pp. 64-71, 2003. Simpler and More General Minimization for Weighted Finite-State Automata Jason Eisner Department of Computer Science Johns Hopkins University Baltimore, MD, USA 21218-2691 jason@cs.jhu.edu Abstract semiring and will be explained below. K -valued func- tions that can be computed by finite-state automata are Previous work on minimizing weighted finite-state automata called rational functions . (including transducers) is limited to particular types of weights. How does minimization generalize to arbitrary weight We present efficient new minimization algorithms that apply much more generally, while being simpler and about as fast. semirings? The question is of practical as well as theoret- We also point out theoretical limits on minimization algo- ical interest. Some NLP automata use the real semiring rithms. We characterize the kind of “well-behaved” weight ( R , + , × ) , or its log equivalent, to compute unnormalized semirings where our methods work. Outside these semirings, probabilities or other scores outside the range [0 , 1] (Laf- minimization is not well-defined (in the sense of producing a ferty et al., 2001; Cortes et al., 2002). Expectation semir- unique minimal automaton), and even finding the minimum number of states is in general NP-complete and inapproximable. ings (Eisner, 2002) are used to handle bookkeeping when training the parameters of a probabilistic transducer. A byproduct of this paper is a minimization algorithm that 1 Introduction works fully with those semirings, a new result permitting It is well known how to efficiently minimize a determin- more efficient automaton processing in those situations. istic finite-state automaton (DFA), in the sense of con- Surprisingly, we will see that minimization is not structing another DFA that recognizes the same language even well-defined for all weight semirings! We will as the original but with as few states as possible (Aho et then (nearly) characterize the semirings where it is well- al., 1974). This DFA also has as few arcs as possible. defined, and give a recipe for constructing minimization Minimization is useful for saving memory, as when algorithms similar to Mohri’s in such semirings. building very large automata or deploying NLP systems Finally, we follow this recipe to obtain a specific, sim- on small hand-held devices. When automata are built up ple and practical algorithm that works for all division through complex regular expressions, the savings from semirings . All the cases above either fall within this minimization can be considerable, especially when ap- framework or can be forced into it by adding multiplica- plied at intermediate stages of the construction, since (for tive inverses to the semiring. The new algorithm provides example) smaller automata can be intersected faster. arguably simpler minimization for the cases that Mohri Recently the computational linguistics community has has already treated, and also handles additional cases. turned its attention to weighted automata that compute 2 Weights and Minimization interesting functions of their input strings. A traditional automaton only returns an boolean from the set K = We introduce weighted automata by example. The trans- { true, false } , which indicates whether it has accepted ducer below describes a partial function from strings to the input. But a probabilistic automaton returns a prob- strings. It maps aab �→ xyz and bab �→ wwyz . Why? ability in K = [0 , 1] , or equivalently, a negated log- Since the transducer is deterministic, each input (such as probability in K = [0 , ∞ ] . A transducer returns an output aab ) is accepted along at most one path; the correspond- string from K = ∆ ∗ (for some alphabet ∆ ). ing output (such as xyz ) is found by concatenating the Celebrated algorithms by Mohri (1997; 2000) have output strings found along the path. ε denotes the empty a:y recently made it possible to minimize deterministic au- string. tomata whose weights (outputs) are log-probabilities or b:zz a:x 1 2 b:z strings. These cases are of central interest in language 0 a:wwy b: ε and speech processing. 5 3 ε b: b:wwzzz However, automata with other kinds of weights can also be defined. The general formulation of weighted 4 automata (Berstel and Reutenauer, 1988) permits any weight set K , if appropriate operations ⊕ and ⊗ are pro- δ and σ standardly denote the automaton’s transition and output functions : δ (3 , a ) = 2 is the state reached by the vided for combining weights from the different arcs of the automaton. The triple ( K, ⊕ , ⊗ ) is called a weight a arc from state 3, and σ (3 , a ) = wwy is that arc’s output.
Recommend
More recommend