A comparison of multiple methods for rescoring keyword search lists for low resource languages
نویسندگان
چکیده
We review the performance of a new two-stage cascaded machine learning approach for rescoring keyword search output for low resource languages. In the first stage Confusion Networks (CNs) are rescored for improved Automatic Speech Recognition (ASR) by reranking the arcs of each confusion bin. In the second stage we generate keyword search hypotheses from the rescored ASR output and rescore them using logistic regression classifiers to detect true hits and false alarms. We compare the performance of our system with state of the art rescoring techniques, including probability of false alarm normalization, exponential normalization, rank-normalized posterior scores and sum-to-one normalization and show promising results. Experimental validation is performed using the Term Weighted Value (TWV) metric on four corpora from the IARPA-Babel program for keyword search on low resource languages, including Assamese, Bengali, Lao and Zulu.
منابع مشابه
Spoken Keyword Rescoring and Document Retrieval for Low-resource Languages
For languages that have adequate data for automatic speech recognition (ASR), many keyword search(KWS) and document retrieval(SDR) systems have been developed with near-optimal performance. However, lacking of sufficient training data to produce high accuracy transcript, identification and retrieval of queries in speech data from low-resources languages remains challenging. To compensate for th...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملStrategies for rescoring keyword search results using word-burst and acoustic features
The identification of keyword queries in speech data from lowresources languages poses a challenge for current methods as speech recognition algorithms lack sufficient training data to produce high accuracy transcript. To compensate for these shortcomings, we extract signals from the data that are useful in keyword identification but are not being used by the speech recognizer. These signals ta...
متن کاملEcholocation: Using Word-Burst Analysis to Rescore Keyword Search Candidates in Low-Resource Languages
ECHOLOCATION: USING WORD-BURST ANALYSIS TO RESCORE KEYWORD SEARCH CANDIDATES IN LOW-RESOURCE LANGUAGES
متن کاملJoint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages
Keyword spotting (KWS) for low-resource languages has drawn increasing attention in recent years. The state-of-the-art KWS systems are based on lattices or Confusion Networks (CN) generated by Automatic Speech Recognition (ASR) systems. It has been shown that considerable KWS gains can be obtained by combining the keyword detection results from different forms of ASR systems, e.g., Tandem and H...
متن کامل