Spoken Document Retrieval Using Neighboring Documents and Extended Language Models for Query Likelihood Model
نویسندگان
چکیده
This paper proposes several approaches for NTCIR-12 SpokenQuery & Doc-2[1]. Our methods are based on the query likelihood model which is one of the probabilisrtic language models choosing Dirichlet smoothing. We try to improve the performance by using extended language models. First, this paper develops and uses the language model obtained from related research papers. Second, this paper proposes a smoothing method employing the cache model and the N -gram model based on Kneser-Ney smoothing. Finally, this paper proposes a smoothing method using neighboring documents. Experiments were conducted to evaluate these methods using NTCIR-12 test sets.
منابع مشابه
Spoken Document Retrieval Using Extended Query Model and Web Documents
This paper proposes a novel approach for spoken document retrieval. In our method, a query model which is one of the probabilistic language models is adopted, in order to computes a probability to generate a given query from each document. We employ not only a “static” document collection consisting of targeted documents but also a “dynamic” document collection including web documents related w...
متن کاملEffects of Query Expansion for Spoken Document Passage Retrieval
One of the major challenges for spoken document retrieval is how to handle speech recognition errors within the target documents. Query expansion is promising for this challenge. In this paper, we apply relevance models, a type of query expansion method, for the spoken document passage retrieval task. We adapted the original relevance model for passage retrieval. We also extended it to benefit ...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملRobust retrieval models for false positive errors in spoken documents
How to deal with speech recognition errors and out-ofvocabulary (OOV) words, which are referred to as false negative errors, are common challenges in spoken document processing. To deal with them in spoken content retrieval (SCR), the SCR method that incorporated spoken term detection (STD) as the pre-process stage (referred to as STD-SCR) has been proposed. However, the STD-SCR tends to increa...
متن کامل