Spoken Keyword Rescoring and Document Retrieval for Low-resource Languages

نویسنده

  • Min Ma
چکیده

For languages that have adequate data for automatic speech recognition (ASR), many keyword search(KWS) and document retrieval(SDR) systems have been developed with near-optimal performance. However, lacking of sufficient training data to produce high accuracy transcript, identification and retrieval of queries in speech data from low-resources languages remains challenging. To compensate for these shortcomings, we extracted signals from the data that are useful in KWS/SDR. These signals take multiple forms word/morpheme burstiness, rescored confusion network posteriors, acoustic/prosodic qualities, phoneme recognition results, etc. Different strategies have been proposed: 1) a four-way classification of keyword hypotheses; 2) learning-to-rank algorithms; 3) fusion of symbolic subsequence matching and information-based dynamic time wrapping; 4) prosody-based language model. We evaluated these strategies to the speech data from IARPA-BABEL and MediaEval-QUESST Workshop. The paper summarizes the previous work of author in spoken keyword rescoring and document retrieval, as well as discusses the ongoing exploration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

A comparison of multiple methods for rescoring keyword search lists for low resource languages

We review the performance of a new two-stage cascaded machine learning approach for rescoring keyword search output for low resource languages. In the first stage Confusion Networks (CNs) are rescored for improved Automatic Speech Recognition (ASR) by reranking the arcs of each confusion bin. In the second stage we generate keyword search hypotheses from the rescored ASR output and rescore them...

متن کامل

Spoken Document Retrieval by Contents Complement and Keyword Expansion Using Subordinate Concept for NTCIR-SpokenDoc

We report on the result of investigating which relationship is important among hypernym and hyponym relationships in retrieval keyword expansion. Moreover, we report the effect of the keyword expansion and the contents complement for spoken document retrieval for SCR lecture retrieval task and SCR passage retrieval task. Spoken Document Retrieval by contents complement and keyword expansion usi...

متن کامل

Combining Subword and State-level Dissimilarity Measures for Improved Spoken Term Detection in NTCIR-11 SpokenQuery&Doc Task

In recent years, demands for distributing or searching multimedia contents are rapidly increasing and more effective method for multimedia information retrieval is desirable. In the studies on spoken document retrieval systems, much research has been presented focusing on the task of spoken term detection (STD), which locates a given search term in a large set of spoken documents. Recently, in ...

متن کامل

Comparing Isolately Spoken Keywords Queries for Japanese Spoken D

This paper describes a Japanese spoken document retrieval system that uses voice input queries. We prepare two types of spoken queries: isolately spoken keywords and spontaneously spoken queries. To solve a mis-recognition problem of spoken queries, N-best hypotheses of transcripts of queries are used, and keyword candidates are selected from them by mutual information between recognized words....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015