Factorization of Ambiguous Finite-State Transducers

نویسنده

  • André Kempe
چکیده

This article describes an algorithm for factorizing an ambiguous finite-state transducer (FST) into two FSTs, and , such that is functional and retains the ambiguity of the original FST. The application of to the output of never leads to a state that does not provide a transition for the next input symbol, and always terminates in a final state. In other words, contains no “failing paths” whereas in general does. Since is functional, it can be factorized into a left-sequential and a right-sequential FST that jointly constitute a bimachine. The described factorization can accelerate the processing because no failing paths are ever followed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of ε-Cycles from Finite-State Transducers

Much attention has been brought to determinization and ε-removal in previous work. This article describes an algorithm for extracting all ε-cycles, which are a special type of non-determinism, from an arbitrary finite-state transducer (FST). The algorithm factorizes (decomposes) the FST, T , into two FSTs, T1 and T2, such that T1 contains no ε-cycles and T2 contains all ε-cycles of T . Since ε-...

متن کامل

On Weakly Ambiguous Finite Transducers

By weakly ambiguous (finite) transducers we mean those transducers that, although being ambiguous, may be viewed to be at arm’s length from unambiguity. We define input-unambiguous (IU) and input-deterministic (ID) transducers, and transducers with finite codomain (FC). IU transductions are characterized by nondeterministic bimachines and ID transductions can be represented as a composition of ...

متن کامل

A Lexical Interface for Finite-State Syntax

This document describes the lexical interface for nite-state syntax as it is currently implemented and used for the development of the French constraint grammar. The system includes nite-state transducers for multiword expressions, for capitalised, misspelt or unknown words and for accent recovery. It also encodes general multiword expressions such as dates or idioms. The tokeniser includes a n...

متن کامل

Morphological Disambiguation and Text Normalization for Southern Quechua Varieties

We built a pipeline to normalize Quechua texts through morphological analysis and disambiguation. Word forms are analyzed by a set of cascaded finite state transducers which split the words and rewrite the morphemes to a normalized form. However, some of these morphemes, or rather morpheme combinations, are ambiguous, which may affect the normalization. For this reason, we disambiguate the morp...

متن کامل

General Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers

We present efficient algorithms for testing the finite, polynomial, and exponential ambiguity of finite automata with ǫ-transitions. We give an algorithm for testing the exponential ambiguity of an automaton A in time O(|A| E ), and finite or polynomial ambiguity in time O(|A| E ), where |A|E denotes the number of transitions of A. These complexities significantly improve over the previous best...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000