Mining and Re-ranking for Answering Biographical Queries on the Web
نویسندگان
چکیده
The rapid growth of the Web has made itself a huge and valuable knowledge base. Among them, biographical information is of great interest to society. However, there has not been an efficient and complete approach to automated biography creation by querying the web. This paper describes an automatic web-based question answering system for biographical queries. Ad-hoc improvements on pattern learning approaches are proposed for mining biographical knowledge. Using bootstrapping, our approach learns surface text patterns from the web, and applies the learned patterns to extract relevant information. To reduce human labeling cost, we propose a new IDF-inspired reranking approach and compare it with pattern’s precisionbased re-ranking approach. A comparative study of the two re-ranking models is conducted. The tested system produces promising results for answering biographical queries.
منابع مشابه
NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task
Users express their information needs in terms of queries to find the relevant documents on the web. However, users’ queries are usually short, so that search engines may not have enough information to determine their exact intents. How to diversify web search results to cover users’ possible intents as wide as possible is an important research issue. In this paper, we will propose several subt...
متن کاملمدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کاملMaximum Entropy Context Models for Ranking Biographical Answers to Open-Domain Definition Questions
In the context of question-answering systems, there are several strategies for scoring candidate answers to definition queries including centroid vectors, bi-term and context language models. These techniques use only positive examples (i.e., descriptions) when building their models. In this work, a maximum entropy based extension is proposed for context language models so as to account for reg...
متن کاملA Frequency Mining-Based Algorithm for Re-ranking Web Search Engine Retrievals
Conventional web search engines retrieve too many documents for the majority of the submitted queries; therefore, they possess a good recall, since there are far more pages than a user can look at. Precision; however, is a critical factor in these conditions, because the most related documents should be presented at the top of the list. In this paper, we propose an online page re-rank model whi...
متن کاملSpoken question answering using tree-structured conditional random fields and two-layer random walk
In this paper, we consider a spoken question answering (QA) task, in which the questions are in form of speech, while the knowledge source for answers are the webpages (in text) over the Internet to be accessed by an information retrieval engine, and we mainly focus on query formulation and re-ranking part. Because the recognition results for the spoken questions are less reliable, we use N-bes...
متن کامل