Enhancing Morphological Analyzers by Unknown Word Decomposition
ثبت نشده
چکیده
This paper describes an approach how to integrate the decomposition of non-lexicalized word compounds and derivations into the morphological analyzers of a company's NLP product line. The component employs word formation rules and filtering techniques to decompose words, which are not contained in the underlying dictionary database, thereby increasing the average word recognition rate of the morphological analyzers from 90.6% to 95.4%.
منابع مشابه
How to Disassemble Alphabetical Processions - Morphological Treatment of Unknown Words
This paper describes an approach how to integrate the decomposition of non-lexicalized word compounds and derivations into the morphological analyzers of a NLP product line. The component employs word formation rules and filtering techniques to decompose words, which are not contained in the underlying dictionary database, thereby increasing the average word recognition rate of the morphologica...
متن کاملComposition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents
We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...
متن کاملRapid Development of Morphological Analyzers for Typologically Diverse Languages
The Low Resource Language research conducted under DARPA’s Broad Operational Language Translation (BOLT) program required the rapid creation of text corpora of typologically diverse languages (Turkish, Hausa, and Uzbek) which were annotated with morphological information, along with other types of annotation. Since the output of morphological analyzers is a significant aid to morphological anno...
متن کاملInvestigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data
One of the challenges of Arabic speech recognition is to deal with the huge lexical variety. Morphological decomposition has been proposed to address this problem by increasing lexical coverage, thereby reducing errors that are due to words that are unknown to the system. In our previous attempts to develop an Arabic speech-to-text (STT) transcription system with morphological decomposition, an...
متن کاملSyllable-based probabilistic morphological analysis model of Korean
In this paper, we present a syllable-based probabilistic morphological analysis model of Korean. While the previous morphological analyzers that regardmorpheme as a processing unit, the model exploits syllable as a processing unit in order to endure the unknown word problem. Actually, it does not use any morpheme dictionary. In contract to the previous systems that depend on manually constructe...
متن کامل