A Japanese National Project on Spontaneous Speech Corpus and Processing Technology
نویسندگان
چکیده
A new national project for raising the technological level of speech recognition and understanding has recently commenced in Japan. This project aims at a) building a large-scale spontaneous speech corpus consisting of roughly 7M words and 800 hours of speech, b) acoustic and linguistic modeling for spontaneous speech understanding and summarization using linguistic as well as para-linguistic information in speech, and c) building a prototype of a spontaneous speech summarization system. The corpus under compilation will contain spontaneously uttered Common Japanese speech and the morphologically annotated transcriptions. Also, segmental and intonation labeling will be provided for a subset of the corpus. The primary application domain of the corpus is speech recognition of spontaneous speech, but it is also planned to become a useful research corpus both for natural language processing and phonetic/linguistic studies.
منابع مشابه
Steps toward Flexible Speech Recognition – Recent Progress at Tokyo Institute of Technology –
This paper describes recent progress at Tokyo Institute of Technology and the author’s perspectives for making speech recognition systems more flexible at both the acoustic and linguistic processing levels. Specifically, it describes a broadcast news transcription system, a multimodal dialogue system for information retrieval, neural-network-based HMM adaptation for noisy speech, online increme...
متن کاملRecent Progress in Corpus-Based Spontaneous Speech Recognition
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance f...
متن کاملSpontaneous Speech Recognition and Summarization
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology focusing on various achievements of a Japanese 5-year national project “Spontaneous Speech: Corpus and Processing Technology”. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automat...
متن کاملCorpus and Text Analysis of Spontaneous Japanese
There are three major parts of the “Spontaneous Speech: Corpus and Processing Technology” project; (1) compilation of large spontaneous speech corpus, (2) establishment of spoken language engineering based on the corpus, and (3) developing a prototype of a spoken language summarization system. This paper describes how we help to develop this large corpus, i.e., (1), using technology developed a...
متن کاملRecent Advances in Spontaneous Speech Recognition and Understanding
How to recognize and understand spontaneous speech is one of the most important issues in state-of-the-art speech recognition technology. In this context, a five-year large-scale national project entitled “Spontaneous Speech: Corpus and Processing Technology” started in Japan in 1999. This paper gives an overview of the project and reports on the major results of experiments that have been cond...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003