Text Disambiguation By Finite State Automata, An Algorithm And Experiments On Corpora
نویسنده
چکیده
The exploration of the context should provide clues that eliminate the non-relevant solutions. For this purpose we use local grammar constraints represented by finite automata. We have designed and implemented an algorithm which performs this task by using a large variety of linguistic constraints. Both the texts and the rules (or constraints) are represented in the same formalism, that is finite automata. Performing subtraction operations between text automata and constraint automata reduce the ambiguities. Experiments were performed on French texts with large scale dictionaries (one dictionary of 600.000 simple inflected forms and one dictionary of 150.000 inflected compounds). Syntactic patterns represented by automata, including shapes of compound nouns such as Noun followed by an Adjective (in gender-number agreement) (Cf 5. I), can be matched in texts. This process is thus an extension of the classic matching procedures because of the on-line dictionary consultation and because of the grammar constraints. It provides a simple and efficient indexing tool.
منابع مشابه
A generalized disambiguation algorithm for weighted finite automata and its application to NLP tasks
We present a disambiguation algorithm for weighted finite tree automata (FTA). This algorithm converts ambiguous FTA into equivalent non-ambiguous one where no two accepting paths labeled with the same tree exists. The notion of non-ambiguity is similar to that of determinism in the automata theory, but we show that disambiguation is applicable to the wider class of weighted automata than deter...
متن کاملA Disambiguation Algorithm for Finite Automata and Functional Transducers
We present a new disambiguation algorithm for finite automata and functional finite-state transducers. We give a full description of the algorithm, including a detailed pseudocode and analysis, and several illustrating examples. Our algorithm is often more efficient and the result dramatically smaller than the one obtained using determinization for finite automata or an existing disambiguation ...
متن کاملThe correctness of a generalized disambiguation algorithm for finite automata
We present a generalized disambiguation algorithm of finite state automata, and show a proof of its correctness. This algorithm can remove ambiguities of finite state and tree automata. Our proposed algorithm can make finite state and tree automata more efficient to use in many applications.
متن کاملBootstrapping Large Sense Tagged Corpora
The performance of Word Sense Disambiguation systems largely depends on the availability of sense tagged corpora. Since the semantic annotations are usually done by humans, the size of such corpora is limited to a handful of tagged texts. This paper proposes a generation algorithm that may be used to automatically create large sense tagged corpora. The approach is evaluated through comparative ...
متن کاملResolving Translation Ambiguity Using Non-Parallel Bilingual Corpora
This paper presents an unsupervised method for choosing the correct translation of a word in context. It learns disambiguation information from nonparallel bilinguM corpora (preferably in the same domain) free from tagging. Our method combines two existing unsupervised disambiguation algorithms: a word sense disambiguation algorithm based on distributional clustering and a translation disambigu...
متن کامل