Constructing Japanese test collections for spoken term detection

نویسندگان

  • Yoshiaki Itoh
  • Hiromitsu Nishizaki
  • Xinhui Hu
  • Hiroaki Nanjo
  • Tomoyosi Akiba
  • Tatsuya Kawahara
  • Seiichi Nakagawa
  • Tomoko Matsui
  • Yoichi Yamashita
  • Kiyoaki Aikawa
چکیده

Spoken Document Retrieval (SDR) and Spoken Term Detection (STD) have been two of the most intensively investigated topics in spoken document processing research according to the establishment of the SDR and STD test collections by the Text REtrieval Conference (TREC) and NIST. Because Japanese spoken document processing researchers also requires such test collections for SDR and STD, we have established a working group to develop these collections in Special Interest Group -Spoken Language Processing (SIG-SLP) of the Information Processing Society of Japan. The working group has constructed and made available a test collection for SDR, and is now constructing new test collections for STD that will be open to researchers. The present paper introduces the policies, outline, and schedule of the new test collections. Then, the new test collections are compared with the NIST STD test collections.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection

The detection of out-of-vocabulary (OOV) query terms is a crucial problem in spoken term detection (STD), because OOV query terms are likely. To enable search of OOV query terms in STD systems, a query subword sequence is compared with subword sequences generated using an automatic speech recognizer against spoken documents. When comparing two subword sequences, the edit distance is a typical d...

متن کامل

Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection

This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on ...

متن کامل

Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition

We propose an unsupervised technique to model the spoken query using hidden Markov model (HMM) for spoken term detection without speech recognition. By unsupervised segmentation, clustering and training, a set of HMMs, referred to as acoustic segment HMMs (ASHMMs), is generated from the spoken archive to model the signal variations and frame trajectories. An unsupervised technique is also desig...

متن کامل

Evaluation of re-ranking by prioritizing highly ranked documents in spoken term detection

In spoken term detection, the detection of out-of-vocabulary (OOV) query terms is very important because of the high probability of OOV query terms occurring. This paper proposes a re-ranking method for improving the detection accuracy for OOV query terms after extracting candidate sections by conventional method. The candidate sections are ranked by using dynamic time warping to match the quer...

متن کامل

Overview of the NTCIR-11 SpokenQuery&Doc Task

This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc) task at the NTCIR-11Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) as the main sub-task. With a spoken query driven spoken term detection task (SQSTD) as an additional sub-task. The paper describes details of each sub-task, the data used, the creation of the sp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010