Integrating Classical Sindarin Morphology and Syntax using XFST and XLE
نویسندگان
چکیده
Sindarin is a constructed language invented by the author and linguist J.R.R. Tolkien. It is the language spoken by the Elves. Though the number of complete sindarin sentences is rather limited[1] enough information is available to derive syntactic rules and build a Sindarin grammar. This project aims at modelling the basics of Classical Sindarin grammar in LFG using Xerox XLE system. In addition it interfaces to Xerox XFST to lookup morphological information which has already been implemented in another project. For more information about Classical Sindarin morphology see the project documentation[2]. The project is mainly based on information provided by David Salo’s ”A Gateway to Sindarin”[1]. 1 System Overview Figure 1 shows an overview of the system implementation. It consists of three components which are further discussed in more detail. 1.1 Tokenizer The tokenizer takes textual input and generates tokens that can be further processed and analysed by the morphological analyser. It performs the following tasks: – all characters are cast to lower-case. – accents are removed. – binding articles (like i-) are split from nouns. – numbers are identified and isolated – special and punctuation characters characters are isolated. – identified tokens are separated by a token boundary TB. Tokenization removes ambiguities and simplifies the creation and maintenance of lexicon files for the morphological analyser and the xle grammar. Though xle provides a basic tokenizer that is suitable for english and german grammars it is not sufficient for parsing Sindarin. The main reason is the existence of articles that bind directly to the corresponding noun. They have to be split and 2 Thomas Zink, Andreas Engl, and Wolfgang Miller
منابع مشابه
Classical Sindarin Morphology
All of his life, the linguist and author J.R.R. Tolkien invented languages. Most effort went into his elvish languages, Quenya and Sindarin, and he made changes to them until his death in 1973 [1]. His intention was to create realistic languages, especially in Sindarin, with harmonic sounds. Thus, his languages don’t follow logic rules but are full of irregularities. Though from a strict perspe...
متن کاملParsing Modern Greek verb MWEs with LFG/XLE grammars
We report on the first, still on-going effort to integrate verb MWEs in an LFG grammar of Modern Greek (MG). Text is lemmatized and tagged with the ILSP FBT Tagger and is fed to a MWE filter that marks Words_With_Spaces in MWEs. The output is then formatted to feed an LFG/XLE grammar that has been developed independently. So far we have identified and classified about 2500 MWEs, and have proces...
متن کاملA Multilingual Natural-Language Interface To Regular Expressions
A b s t r a c t . This report explains a natural-language interface to the formalism of XFST (Xerox Finite State Tool), which is a rich language used for specifying finite state automata and transducers. By using the interface, it is possible to give input to XFST in English and French, as well as to translate formal XFST code into these languages. It is also possible to edit XFST source files ...
متن کاملXFST2FSA: Comparing Two Finite-State Toolboxes
This paper introduces xfst2fsa, a compiler which translates grammars expressed in the syntax of the XFST finite-state toolbox to grammars in the language of the FSA Utilities package. Compilation to FSA facilitates the use of grammars developed with the proprietary XFST toolbox on a publicly available platform. The paper describes the non-trivial issues of the compilation process, highlighting ...
متن کاملOverlay Mechanisms for Multi-level Deep Processing Applications
Deep grammars that include tokenization, morphology, syntax, and semantic layers have obtained broad coverage in conjunction with high efficiency. This allows them to play a crucial role in applications. However, these grammars are often developed as a general purpose grammar, expecting “standard” input, and have to be specialized for the application domain. This paper discusses some engineerin...
متن کامل