Phoneme-level speech and natural language intergration for agglutinative languages
نویسندگان
چکیده
A new tightly coupled speech and natural language integration model is presented for a TDNN-based large vocabulary continuous speech recognition system. Unlike the popular n-best techniques developed for integrating mainly HMM-based speech and natural language systems in word level, which is obviously inadequate for the morphologically complex agglutinative languages, our model constructs a spoken language system based on the phoneme-level integration. The TDNN-CYK spoken language architecture is designed and implemented using the TDNN-based diphone recognition module integrated with the table-driven phono-logical/morphological co-analysis. Our integration model provides a seamless integration of speech and natural language for con-nectionist speech recognition systems especially for morphologically complex languages such as Korean. Our experiment resultsdation). We also thank to WonIl Lee for coding the lexicon and the morphological parser and to professor Hong Jeong for his valuable suggestions for the earlier draft of this paper. An extended version of this paper was submitted to the journal of natural language engineering for a review. show that the speaker-dependent continuous Eojeol (word) recognition can be integrated with the morphological analysis with over 80% morphological analysis success rate directly from the speech input for the middle-level vocabularies.
منابع مشابه
A Viterbi-based morphological analysis for speech and natural language integration
This paper presents a statistical/symbolic hybrid morphological analysis, called V-morph, for large scale speech and natural language integration for Korean. In the V-morph approach, statistical Viterbi-based lexical decoding and symbolic morphological modeling are integrated together on top of connectionist phoneme recognition engine. Linguistic characteristics of Korean are appropriately cons...
متن کاملIntegrating connectionist, statistical and symbolic approaches for continuous spoken Korean processing
This paper presents a multi-strategic and hybrid approach for large-scale integrated speech and natural language processing, employing connectionist, statistical and symbolic techniques. The developed spoken Korean processing engine (SKOPE) integrates connectionist TDNN-based phoneme recognition technique with statistical Viterbi-based lexical decoding and symbolic morphological/phonological an...
متن کاملJoint PoS Tagging and Stemming for Agglutinative Languages
The number of word forms in agglutinative languages is theoretically infinite and this variety in word forms introduces sparsity in many natural language processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity. In this paper, we present an unsupervised Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and stemming for ag...
متن کاملA Language-Independent Unsupervised Model for Morphological Segmentation
Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological...
متن کاملTurkish LVCSR: Database Preparation and Language Modeling for an Agglutinative Language
Turkish language is an agglutinative language. It is possible to produce a very high number of words from the same root with suffixes [1]. Language modeling for agglutinative languages needs to be different than modeling of languages like English. Such languages also have inflections but not as many as an agglutinative language. Techniques which can be used for modeling agglutinative languages ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/cmp-lg/9411013 شماره
صفحات -
تاریخ انتشار 1994