Query-by-example Spoken Term Detection using Attention-based Multi-hop Networks

نویسندگان

  • Chia-Wei Ao
  • Hung-yi Lee
چکیده

Retrieving spoken content with spoken queries, or query-byexample spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of the existing query-by-example STD system, yielding performance comparable to the existing system but with a lower search time complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTU System at MediaEval 2015: Zero Resource Query by Example Spoken Term Detection using Deep and Recurrent Neural Networks

This note serves as a documentation describing the methods the authors of this paper implemented for the Query by Example Search on Speech Task (QUESST) as a part of MediaEval 2015. In this work, we combined DTW, DNN and RNN in one framework to perform query by example spoken term detection in a zero resource setting.

متن کامل

A multi-hop PSO based localization algorithm for wireless sensor networks

A sensor network consists of a large number of sensor nodes that are distributed in a large geographic environment to collect data. Localization is one of the key issues in wireless sensor network researches because it is important to determine the location of an event. On the other side, finding the location of a wireless sensor node by the Global Positioning System (GPS) is not appropriate du...

متن کامل

Distinctive feature based representation of speech for query-by-example spoken term detection

In this paper, we address the problem of searching spoken queries within spoken databases, which is referred to as queryby-example Spoken Term Detection (QbE STD). A knowledgebased posteriorgram representation of speech is proposed. The knowledge of sound pattern of a language can be captured in terms of binary distinctive features (DFs). This idea is tailored for the needs of an STD system. Th...

متن کامل

IIIT-H SWS 2013: Gaussian Posteriorgrams of Bottle-Neck Features for Query-by-Example Spoken Term Detection

This paper describes the experiments conducted for spoken web search (SWS) at MediaEval 2013 evaluations. A conventional approach is to train a multi-layer perceptron using high resource languages and then use it in the low resource scenario. However, phone posteriorgrams have been found to under-perform when the language they were trained on differs from the target language. In this paper, we ...

متن کامل

Unsupervised speech processing with applications to query-by-example spoken term detection

This thesis is motivated by the challenge of searching and extracting useful information from speech data in a completely unsupervised setting. In many real world speech processing problems, obtaining annotated data is not cost and time effective. We therefore ask how much can we learn from speech data without any transcription. To address this question, in this thesis, we chose the query-by-ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.00354  شماره 

صفحات  -

تاریخ انتشار 2017