Multilingual Spoken Term Detection: Finding and Testing New Pronunciations

نویسندگان

  • Richard Sproat
  • Jim Baker
  • Martin Jansche
  • Bhuvana Ramabhadran
  • Michael Riley
  • Murat Saraçlar
  • Abhinav Sethy
  • Patrick Wolfe
  • Sanjeev Khudanpur
  • Arnab Ghoshal
  • Kristy Hollingshead
  • Chris White
  • Ting Qian
  • Erica Cooper
  • Morgan Ulinski
چکیده

When you listen to the evening news, or read a newspaper, book or web site, there is a good chance that you will hear or see a term — perhaps a name, perhaps a technical term — that you have never seen before. Such words are often novel or rare and are often names (of people, places, organizations. . . ). They are hard for humans to process, but they are even harder for automatic speech and language processing systems. For a single language, a speech recognition or text-to-speech system needs to know how to pronounce a word to recognize or say it. For two languages, in particular a pair with different writing systems, a search engine or document summarizer needs to know how to transcribe one word to another to retrieve or distill across languages. For example, the soccer player written in English text as Maciej Zurawski would appear as 마치에이주라브스키 (ma-chi-e-i ju-ra-beu-seu-ki) in Korean In this project we attack both problems – unusual term pronunciation and term transcription. For pronunciation, we make use of the huge numbers of pronunciations that are now available in various forms on the web to mine pronunciations. This ranges from straightforward, such as dictionary sites and Wikipedia entries where people use a fairly strict phonetic transcription system such as IPA, to difficult such as: • Trio Shares Nobel Prize in Medicine – The Nobel is a particularly striking achievement for Capecchi (pronounced kuh-PEK’-ee). Here we need to look in the vicinity of the name “Capecchi” to find the pronunciation, make use of the word “pronounced”, and then interpret the writer’s attempt to render the pronunciation using an Englishbased ad-hoc “phonetic” orthography. The problem is therefore one of entity extraction, where the entities to extract can be either relatively easy or relatively hard. A relatively easy case is Wikipedia, which uses standard IPA transcriptions that are clearly delimited by markup. On general web pages, tokens with Unicode IPA characters are potential pronunciations. Data extracted from Wikipedia can be matched against these tokens to provide training material for entity extraction. Statistical entity extractors for the more difficult case of ad-hoc phonetic transcriptions (such as “kuh-PEK’-ee” above) can be bootstrapped from unannotated web pages containing patterns such as “pronounced as”. These entity extractors make use of both the textual environment and the letter-to-sound constraints between the candidate pronunciation and its corresponding orthography. We also use speech data to test possible pronunciation variants by comparing the performance of spoken term detection systems using these different variants. Pronunciations mined from the web are used to suggest pronunciations for spoken term detection; transcription are used to suggest reasonable candidates to search for in a speech stream in another language. We use a novel technique called delayed-decision testing to test candidate pronunciations in speech, and to choose the best one from a set of candidates via a sequential testing procedure, with the associated null hypothesis stating that all candidate pronunciations exhibit the same performance on average. Spoken term detection are in turn used for automatic labeling of practice data acquired to test this null hypothesis; however, this automatic labeling procedure inevitably induces false alarms as well as correct detections. Delayed-decision testing are then used to choose the correct pronunciation in spite of these false alarms, leading to improved pronunciations for newly identified terms. For transcription, we use available resources – dictionaries, and text corpora – as well as methods for phonetic matching across scripts and tracking names across time in comparable corpora (such as news sources).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flavoured acoustic model and combined spelling to sound for asymmetrical bilingual environment

The most common target of multilingual ASR aims at covering various speakers from various languages. The problem we address in this article is more specifically an asymmetrical bilingual scenario, where the same speaker may insert in his speech some foreign words using foreign pronunciations. This is a frequent situation for French as spoken in Canada, where English proper names are often spoke...

متن کامل

Stochastic pronunciation modelling for spoken term detection

A major challenge faced by a spoken term detection (STD) system is the detection of out-of-vocabulary (OOV) terms. Although a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. Current approaches to STD do not acknowledge the particular properties of OOV terms, such as pronunciation uncertainty. In this paper, we use a...

متن کامل

SpeeD @ MediaEval 2014: Spoken Term Detection with Robust Multilingual Phone Recognition

In this paper, we attempt to resolve the Spoken Term Detection (STD) problem for under-resourced languages by phone recognition with a multilingual acoustic model of three languages (Albanian, English and Romanian). The Power Normalized Cepstral Coefficients (PNCC) features are used for improved robustness to noise.

متن کامل

Effect of Pronunciations on Oov Queries in Spoken Term Detection

This paper focusses on the effect of pronunciations for Out-ofVocabulary (OOV) query terms on the performance of a spoken term detection (STD) task. OOV terms, typically proper names or foreign language terms occur infrequently but are rich in information. The STD task returns relevant segments of speech that contain one or more of these OOV query terms. The STD system described in this paper i...

متن کامل

Out-of-Vocabulary Spoken Term Detection

Spoken term detection (STD) is a fundamental task for multimedia information retrieval. A major challenge faced by an STD system is the serious performance reduction when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only from the absence of pronunciations for such terms in the system dictionaries, but from intrinsic uncertainty in pronunciations, significant diversity in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008