Foma: a Finite-State Compiler and Library
نویسنده
چکیده
Foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological and phonological analyzers. Foma is largely compatible with the Xerox/PARC finite-state toolkit. It also embraces Unicode fully and supports various different formats for specifying regular expressions: the Xerox/PARC format, a Perl-like format, and a mathematical format that takes advantage of the ‘Mathematical Operators’ Unicode block.
منابع مشابه
JCLext: A Java Tool for Compiling Finite-State Transducers from Full-Form Lexicons
JCLexT is a compiler of finite-state transducers from full-form lexicons, this tool seems to be the first Java implementation of such functionality. A comparison between JCLexT and Foma was performed based on extensive data from Portuguese. The main disadvantage of JCLexT is the slower compilation time, in comparison to Foma. However, this is negated by the fact that a large transducer compiled...
متن کاملProto-Indo-European Lexicon: The Generative Etymological Dictionary of Indo-European Languages
Proto-Indo-European Lexicon (PIE Lexicon) is the generative etymological dictionary of Indo-European languages. The reconstruction of Proto-Indo-European (PIE) is obtained by applying the comparative method, the output of which equals the Indo-European (IE) data. Due to this the Indo-European sound laws leading from PIE to IE, revised in Pyysalo 2013, can be coded using Finite-State Transducers...
متن کاملDeveloping an Open-Source FST Grammar for Verb Chain Transfer in a Spanish-Basque MT System
This paper presents the current status of development of a finite state transducer grammar for the verbal-chain transfer module in Matxin, a Rule Based Machine Translation system between Spanish and Basque. Due to the distance between Spanish and Basque, the verbal-chain transfer is a very complex module in the overall system. The grammar is compiled with foma, an open-source finitestate toolki...
متن کاملA Rule Based Morphological Analyzer and A Morphological Disambiguator for Kazakh Language
Morphological analysis is a very critical issue especially for natural language processing related tasks on agglutinative languages. This study gives the implementation details of a rule-based morphological analyzer of Kazakh language which is an agglutinative language. A detailed computational analysis of Kazakh language morphology such as formalization of alternation and morphotactic rules fo...
متن کاملPorting Basque Morphological Grammars to foma, an Open-Source Tool
Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the ...
متن کامل