Developing an SDR Test Collection from Japanese Lecture Audio Data

نویسندگان

  • Tomoyosi Akiba
  • Kiyoaki Aikawa
  • Yoshiaki Itoh
  • Tatsuya Kawahara
  • Hiroaki Nanjo
  • Hiromitsu Nishizaki
  • Norihito Yasuda
  • Yoichi Yamashita
  • Katunobu Itou
چکیده

The lecture is one of the most valuable genres of audiovisual data. However, spoken lectures are difficult to reuse because browsing and efficient searching within spoken lectures is difficult. To promote the research activities in the spoken lecture retrieval, this paper reports a test collection for its evaluation. The test collection consists of the target spoken documents of about 2,700 lectures (604 hours) taken from the Corpus of Spontaneous Japanese (CSJ), 39 retrieval queries, the relevant passages in the target documents for each query, and the automatic transcription of the target speech data. We report the retrieval performance targeting the constructed test collection by applying a standard spoken document retrieval (SDR) method, which serves as a baseline for the forthcoming SDR studies using the test collection. We also introduce the several studies conducted by the users of the test collection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data

The lecture is one of the most valuable genres of audiovisual data. Though spoken document processing is a promising technology for utilizing the lecture in various ways, it is difficult to evaluate because the evaluation require a subjective judgment and/or the verification of large quantities of evaluation data. In this paper, a test collection for the evaluation of spoken lecture retrieval i...

متن کامل

Test Collections for Spoken Document Retrieval from Lecture Audio Data

The Spoken Document Processing Working Group, which is part of the special interest group of spoken language processing of the Information Processing Society of Japan, is developing a test collection for evaluation of spoken document retrieval systems. A prototype of the test collection consists of a set of textual queries, relevant segment lists, and transcriptions by an automatic speech recog...

متن کامل

Constructing Japanese test collections for spoken term detection

Spoken Document Retrieval (SDR) and Spoken Term Detection (STD) have been two of the most intensively investigated topics in spoken document processing research according to the establishment of the SDR and STD test collections by the Text REtrieval Conference (TREC) and NIST. Because Japanese spoken document processing researchers also requires such test collections for SDR and STD, we have es...

متن کامل

The TREC Spoken Document Retrieval Track: A Success Story

This paper describes work within the NIST Text REtrieval Conference (TREC) over the last three years in designing and implementing evaluations of Spoken Document Retrieval (SDR) technology within a broadcast news domain. SDR involves the search and retrieval of excerpts from spoken audio recordings using a combination of automatic speech recognition and information retrieval technologies. The T...

متن کامل

Unsupervised topic adaptation for lecture speech retrieval

We are developing a cross-media information retrieval system, in which users can view specific segments of lecture videos by submitting text queries. To produce a text index, the audio track is extracted from a lecture video and a transcription is generated by automatic speech recognition. In this paper, to improve the quality of our retrieval system, we extensively investigate the effects of a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009