نتایج جستجو برای: keyword spotting
تعداد نتایج: 16370 فیلتر نتایج به سال:
Keyword Spotting (KWS) aims at detecting speech segments that contain a given query within large amounts of audio data. Typically, a speech recognizer is involved in a first indexing step. One of the challenges of KWS is how to handle recognition errors and out-of-vocabulary (OOV) terms. This work proposes the use of discriminative training to construct a phoneme confusion model, which expands ...
In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-byexample paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the Latent Semantic Analysis techn...
In this paper, we investigate the noise robustness properties of frame-based and sparse point process-based models for spotting keywords in continuous speech. We introduce a new strategy to improve point process model (PPM) robustness by adapting low-level feature detector thresholds to preserve background firing rates in the presence of noise. We find that this unsupervised approach can signif...
Automatic real-time captioning provides immediate and on demand access to spoken content in lectures or talks, and is a crucial accommodation for deaf and hard of hearing (DHH) people. However, in the presence of specialized content, like in technical talks, automatic speech recognition (ASR) still makes mistakes which may render the output incomprehensible. In this paper, we introduce a new ap...
We consider the problem of representing semantic concepts in speech by learning from untranscribed speech paired with images of scenes. This setting is relevant in low-resource speech processing, robotics, and human language acquisition research. We use an external image tagger to generate soft labels, which serve as targets for training a neural model that maps speech to keyword labels. We int...
System combination is a common approach to improving results for both speech transcription and keyword spotting—especially in the context of low-resourced languages where building multiple complementary models requires less computational effort. Using state-of-the-art CNN and DNN acoustic models, we analyze the performance, cost, and trade-offs of four system combination approaches: feature com...
In word spotting, one of the main difficulties is the false alarms, especially for small words. A model is presented for predicting the false alarm rate on the basis of the phonemic content of a word. This model is tested for a word spotter that has been used in the TREC Spoken Document Retrieval (SDR) track. Finally, results are presented for the retrieval task.
Given presentation slides with detailed written speaking notes, automatic tracking of oral presentations can help speakers ensure they cover their planned content, and can reduce their anxiety during the speech. Tracking is a more complex problem than speech-to-text alignment, since presenters rarely follow their exact presentation notes, and it must be performed in realtime. In this paper, we ...
We propose a new method of dynamically synthesizing Korean character image templates and then converting them into P2DHMMs in real-time. This method is more advantageous than whole character HMMs in memory requirement as well as training difficulty. 2004 Elsevier B.V. All rights reserved.
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید