Detection of word fragments in Mandarin telephone conversation
نویسندگان
چکیده
We describe preliminary work on the detection of word fragments in Mandarin conversational telephone speech. We extracted prosodic, voice quality, and lexical features, and trained Decision Tree and SVM classifiers. Previous research shows that glottalization features are instrumental in English fragment detection. However, we show that Mandarin fragments are quite different than English; 90% of Mandarin fragments are followed immediately by a repetition of the fragmentary word. These repetition fragments are not glottalized, and they have a very specific distribution; the 12 most frequent words (“you”, “I”, “that”, “have”, “then”, etc.) cover 50% of the tokens of these fragments. Thus rather than glottalization, we found the most useful feature for Mandarin fragment detection was the identity of the neighboring character (word or morpheme). In an oracle experiment using the true (reference) neighboring words as well as prosodic and voice quality features, we achieved 80% accuracy in Mandarin fragment detection.
منابع مشابه
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS) from over 2100 Mandarin speakers in mainland China under the DARPA EARS framework. The corpus includes speech data, transcriptions and speaker demographic information. The speech data include 1206 ten-minute natural Mandarin conversations between either stran...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملSymbol Sequence Search from Telephone Conversation
We propose a method for searching for symbol sequences in conversations. Symbol sequences can include phone numbers, credit card numbers, and any kind of ticket (identification) numbers and are often communicated in call center conversations. Automatic extraction of these from speech is a key to many automatic speech recognition (ASR) applications such as question answering and summarization. C...
متن کاملTelephone Conversation Closing Strategies Used by Persian Speakers: Rapport Management Approach
The use of politeness strategies can help interlocutors promote and/or maintain social harmony in telephone interactions. Using the Rapport Management Model proposed by Spencer-Oatey (2008), this study aimed primarily to reinvestigate the closing structures of telephone conversation (hereafter abbreviated as TC) in Persian and to discover the common politeness strategies used by native Persian ...
متن کاملTelephone Conversation Closing Strategies Used by Persian Speakers: Rapport Management Approach
The use of politeness strategies can help interlocutors promote and/or maintain social harmony in telephone interactions. Using the Rapport Management Model proposed by Spencer-Oatey (2008), this study aimed primarily to reinvestigate the closing structures of telephone conversation (hereafter abbreviated as TC) in Persian and to discover the common politeness strategies used by native Persian ...
متن کامل