Toward improvement of SDR accuracy using LDA and query expansion for SpokenDoc

نویسندگان

  • Kiichi Hasegawa
  • Hideki Sekiya
  • Masanori Takehara
  • Taro Niinomi
  • Satoshi Tamura
  • Satoru Hayamizu
چکیده

This paper investigates several techniques for spoken document retrieval, toward improvement of retrieval performance based on the conventional method i.e. TF-IDF. The first approach employs rescaled unigrams of LDA to compute a similarity score. The second technique employs query expansion by web retrieval using Yahoo!API. And the third technique is Prioritized And-operator Retrieval based on TF-IDF techniques. We tested these methods using a dry-run data, then it turned out that the third technique is most promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

STD based on Hough Transform and SDR using STD results: Experiments at NTCIR-9 SpokenDoc

In this paper, we report our experiments at NTCIR-9 IR for Spoken Documents (SpokenDoc) task. We participated both the STD and SDR subtasks of SpokenDoc. For STD subtask, we applied novel indexing method, called metric subspace indexing, previously proposed by us. One of the distinctive advantages of the method was that it could output the detection results in increasing order of distance witho...

متن کامل

Spoken document retrieval method combining query expansion with continuous syllable recognition for NTCIR-SpokenDoc

In this paper, we propose a spoken document retrieval method which combines query expansion with continuous syllable recognition. The proposed method expands a query by using words from the web pages collected by a search engine. It is assumed that relevant document vectors exist on the plane which is constructed from the query vector and the extended vector. The weight parameter between a targ...

متن کامل

QEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches

A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...

متن کامل

Spoken Document Retrieval Experiments for SpokenDoc at Ryukoku University (RYSDT)

In this paper, we describe spoken document retrieval systems in Ryukoku University, which were participated in NTCIR-9 IR for Spoken Documents (“SpokenDoc”) task. In NTCIR-9 “SpokenDoc” task, there are two subtasks: “Spoken term detection (STD) subtask” and “Spoken document retrieval (SDR) subtask”. We participated in the both subtasks as team RYSDT. In this paper, first, our STD systems are de...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011