Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR
نویسندگان
چکیده
منابع مشابه
Morpheme Segmentation and Concatenation Approaches for Uyghur LVCSR
In this paper, various kinds of sub-word lexica are thoroughly investigated under the framework of Uyghur LVCSR system. Experimental results show that it is inefficient to directly model based on word units or small units like morpheme or even syllable units. It is observed that an optimal sub-word unit set between word and morpheme units can better fit for ASR system. In order to select best u...
متن کاملMorpheme Based Factored Language Models for German LVCSR
German is a highly inflectional language, where a large number of words can be generated from the same root. It makes a liberal use of compounding leading to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probability estimates. Therefore, the use of morphemes for language modeling is considered a better choice for Large Vocabulary Continuous Speech Recognition (LVCSR) than the...
متن کاملMorpheme Level Feature-based Language Models for German LVCSR
One of the challenges for Large Vocabulary Continuous Speech Recognition (LVCSR) of German is its complex morphology and high level of compounding. It leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, building LMs on morpheme level can be considered a better choice. Thereby, higher lexical coverage and lower LM perplexities are achieved. On ...
متن کاملA Uyghur Morpheme Analysis Method based on Conditional Random Fields
Morpheme analysis is very important for Uyghur language processing. Morpheme analysis of Uyghur is quite different from other language, for this task the keys include feature selection and the design of a morpheme annotated corpus . In this paper we propose a new statistical-based Uyghur morpheme analysis method by using Conditional Random Fields (CRFs) model. The preliminary experiment results...
متن کاملFrequency Effects in Morpheme Segmentation
The present study explores the effects of frequency in learning to parse novel morphological patterns. In two experiments, suffixes were divided into three classes: high, medium and low frequency, based on the proportion of stems in the input that each suffix attached to (high frequency = 12/12, medium frequency = 6/12, and low frequency = 2/12). In Experiment 1, learners were better at segment...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Hybrid Information Technology
سال: 2015
ISSN: 1738-9968
DOI: 10.14257/ijhit.2015.8.8.33