Toward automatic transcription of Japanese broadcast news

نویسندگان

  • Tatsuo Matsuoka
  • Yuichi Taguchi
  • Katsutoshi Ohtsuki
  • Sadaoki Furui
  • Katsuhiko Shirai
چکیده

In this paper, we report on the automatic recognition of Japanese broadcast-news speech. We have been working on largevocabulary continuous speech recognition (LVCSR) for Japanese newspaper speech transcription and have achieved good performance. We have recently applied our LVCSR system to transcribing Japanese broadcast-news speech. We extended the vocabulary from 7k words to 20k words and trained the language models using newspaper texts and broadcast-news manuscripts. These two language models were applied to our evaluation speech sets. The language model trained using broadcast-news manuscripts achieved better results for broadcast-news speech than the language model trained using newspaper texts, which achieved better results for newspaper speech. We achieved a word error rate of 19.7% for anchor-speaker’s speech by using a bigram language model and a trigram language model both trained using broadcast-news manuscripts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Automatic Recognition of Japanese Broadcast News

In this paper we report on automatic recognition of Japanese broadcast-news speech. We have been working on largevocabulary continuous speech recognition (LVCSR) for Japanese newspaper speech transcription and achieved reasonably good performance. We have recently applied our LVCSR system to transcribing Japanese broadcast-news speech. We extended the vocabulary to 20k words and trained the lan...

متن کامل

Japanese broadcast news transcription

In this paper, we describe the on-going development of a Japanese Broadcast News Transcription system at BBN Technologies. This is a collaboration between BBN and NHK to use automatic speech recognition technology to provide live closed caption for NHK’s TV news programs in Japan. We describe what the NHK Broadcast News Corpus comprises and how we adopted transcription technology developed for ...

متن کامل

Japanese Broadcast News Transcription and Topic Detection

This paper reports recent advances in Japanese broadcast news transcription and automatic topic detection from the transcribed news speech. To cope with the variability of the readings for each word, a new method for incorporating reading probability of each word in the decoding process is proposed. As a realistic solution to the new-word problem, a new method is proposed, in which new words ar...

متن کامل

Recent advances in Japanese broadcast news transcription

In this paper, we report on language modeling and acoustic modeling studies for Japanese broadcast news speech recognition. We constructed a language model that reduces recognition errors by utilizing context-dependent readings of Japanese characters. We also introduced filled-pause modeling into the language model. To improve the model’s performance for a series of sentences spoken by one spea...

متن کامل

Real-time recognition of broadcast news

Although the performance of state-of-the-art automatic speech recognition systems on the challenging task of broadcast news transcription has improved considerably in recent years, many of the systems operate in 130-300 times real-time [1]. Many applications of automatic transcription of broadcast news, eg. closedcaption subtitles for television broadcasts, require real-time operation. This pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997