'Early recognition' of polysyllabic words in continuous speech
نویسندگان
چکیده
Humans are able to recognise a word before its acoustic realisation is complete. This in contrast to conventional automatic speech recognition (ASR) systems, which compute the likelihood of a number of hypothesised word sequences, and identify the words that were recognised on the basis of a trace back of the hypothesis with the highest eventual score, in order to maximise efficiency and performance. In the present paper, we present an ASR system, SpeM, based on principles known from the field of human word recognition that is able to model the human capability of ‘early recognition’ by computing word activation scores (based on negative log likelihood scores) during the speech recognition process. Experiments on 1463 polysyllabic words in 885 utterances showed that 64.0% (936) of these polysyllabic words were recognised correctly at the end of the utterance. For 81.1% of the 936 correctly recognised polysyllabic words the local word activation allowed us to identify the word before its last phone was available, and 64.1% of those words were already identified one phone after their lexical uniqueness point. We investigated two types of predictors for deciding whether a word is considered as recognised before the end of its acoustic realisation. The first type is related to the absolute and relative values of the word activation, which trade false acceptances for false rejections. The second type of predictor is related to the number of phones of the word that have already been processed and the number of phones that remain until the end of the word. The results showed that SpeM’s performance increases if the amount of acoustic evidence in support of a word increases and the risk of future mismatches decreases. 2006 Elsevier Ltd. All rights reserved.
منابع مشابه
‘On-line Early Recognition’ of Polysyllabic Words in Continuous Speech
In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a ...
متن کاملA novel approach in continuous speech recognition for Vietnamese, an isolating tonal language
This paper proposes a new approach for the integration of the Vietnamese language characteristics into a Large Vocabulary Continuous Speech Recognition System (LVCSR) which was built for some European languages. Firstly, a new module of tone recognition using Hidden Markov model was constructed. Secondly, several methods were applied to transform a text corpus of monosyllabic words into text co...
متن کاملSub-syllable Acoustic Modeling for Cantonese Speech Recognition
This paper presents a pioneer study on acoustic modeling for continuous Cantonese speech recognition. It starts from the context-independent modeling of sub-syllabic units, namely INITIALs and FINALs, and then moves on to examine a number of context-dependent models that characterize intra-syllable co-articulation. The acoustic models are trained with a large database of Cantonese polysyllabic ...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملThe design and validation of production of poly-syllabic words test in 4-7 years-old Persian speaking children: a preliminary study
abstract objective: Reviewing the literature related to the use of polysyllabic words in the assessment of children's speech sound disorders (SSD) indicated the necessity of using these words to provide additional information to the information of the more common tests, such as prognostic information. Despite strong literature background in other languages, there was not any research that ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 21 شماره
صفحات -
تاریخ انتشار 2007