Automatic Acquisition of Two-Level Morphological Rules
نویسندگان
چکیده
We describe and experimentally evaluate a complete method for the automatic acquisition of two-level rules for morphological analyzers/generators. The input to the system is sets of source-target word pairs, where the target is an inflected form of the source. There are two phases in the acquisition process: (1) segmentation of the target into morphemes and (2) determination of the optimal two-level rule set with minimal discerning contexts. In phase one, a minimal acyclic finite state automaton (AFSA) is constructed from string edit sequences of the input pairs. Segmentaiion of the words into morphemes is achieved through viewing the AFSA as a directed acyclic graph (DAG) and applying heuristics using properties of the DAG as well as the elementary edit operations. For phase two, the determination of the optimal rule set is made possible with a novel representation of rule contexts, with morpheme boundaries added, in a new DAG. We introduce the notion of a delimiter edge. Delimiter edges are used to select the correct twolevel rule type as well as to extract minimal discerning rule contexts from the DAG. Results are presented for English adjectives, Xhosa noun locatives and Afrikaans noun plurals.
منابع مشابه
Enlarging the Croatian Morphological Lexicon by Automatic Lexical Acquisition from Raw Corpora
This paper presents experiments for enlarging the Croatian Morphological Lexicon by applying an automatic acquisition methodology. The basic sources of information for the system are a set of morphological rules and a raw corpus. The morphological rules have been automatically derived from the existing Croatian Morphological Lexicon and we have used in our experiments a subset of the Croatian N...
متن کاملAutomatic Pavement Crack Detection Based on Aerial Imagery
Road health information is an important indicator for assessing the status of the road in management systems. Identifying the abandonment of surfaces is an important process in maintaining roads and traffic safety, which is traditionally conducted on the basis of field surveys. Today, remote sensing methods, especially photogrammetric imaging, are presented. In this article, based on by UAVs im...
متن کاملRule-Based Information Extraction for Structured Data Acquisition using TextMarker
Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining methods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structur...
متن کاملAutomatic Creation of a Morphological Processor in Logic Programming Environment
In this paper we describe a two level processor which automatically creates a mor phological processor from a given set of two level phonological rules and morphotac tic rules The given two level phonological and morphotactic rules are automatically converted into Prolog programs which represent a morphological processor for the lan guage in concern We propose new logical representations for tw...
متن کاملAutomatic Rule Induction for Unknown-Word Guessing
Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose l...
متن کامل