Factorization of Ambiguous Finite-State Transducers
نویسنده
چکیده
This article describes an algorithm for factorizing an ambiguous finite-state transducer (FST) into two FSTs, and , such that is functional and retains the ambiguity of the original FST. The application of to the output of never leads to a state that does not provide a transition for the next input symbol, and always terminates in a final state. In other words, contains no “failing paths” whereas in general does. Since is functional, it can be factorized into a left-sequential and a right-sequential FST that jointly constitute a bimachine. The described factorization can accelerate the processing because no failing paths are ever followed.
منابع مشابه
Extraction of ε-Cycles from Finite-State Transducers
Much attention has been brought to determinization and ε-removal in previous work. This article describes an algorithm for extracting all ε-cycles, which are a special type of non-determinism, from an arbitrary finite-state transducer (FST). The algorithm factorizes (decomposes) the FST, T , into two FSTs, T1 and T2, such that T1 contains no ε-cycles and T2 contains all ε-cycles of T . Since ε-...
متن کاملOn Weakly Ambiguous Finite Transducers
By weakly ambiguous (finite) transducers we mean those transducers that, although being ambiguous, may be viewed to be at arm’s length from unambiguity. We define input-unambiguous (IU) and input-deterministic (ID) transducers, and transducers with finite codomain (FC). IU transductions are characterized by nondeterministic bimachines and ID transductions can be represented as a composition of ...
متن کاملA Lexical Interface for Finite-State Syntax
This document describes the lexical interface for nite-state syntax as it is currently implemented and used for the development of the French constraint grammar. The system includes nite-state transducers for multiword expressions, for capitalised, misspelt or unknown words and for accent recovery. It also encodes general multiword expressions such as dates or idioms. The tokeniser includes a n...
متن کاملMorphological Disambiguation and Text Normalization for Southern Quechua Varieties
We built a pipeline to normalize Quechua texts through morphological analysis and disambiguation. Word forms are analyzed by a set of cascaded finite state transducers which split the words and rewrite the morphemes to a normalized form. However, some of these morphemes, or rather morpheme combinations, are ambiguous, which may affect the normalization. For this reason, we disambiguate the morp...
متن کاملGeneral Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers
We present efficient algorithms for testing the finite, polynomial, and exponential ambiguity of finite automata with ǫ-transitions. We give an algorithm for testing the exponential ambiguity of an automaton A in time O(|A| E ), and finite or polynomial ambiguity in time O(|A| E ), where |A|E denotes the number of transitions of A. These complexities significantly improve over the previous best...
متن کامل