Learning Noun Phrase Query Segmentation

نویسندگان

  • Shane Bergsma
  • Qin Iris Wang
چکیده

Query segmentation is the process of taking a user’s search-engine query and dividing the tokens into individual phrases or semantic units. Identification of these query segments can potentially improve both document-retrieval precision, by first returning pages which contain the exact query segments, and document-retrieval recall, by allowing query expansion or substitution via the segmented units. We train and evaluate a machine-learned query segmentation system that achieves 86% segmentationdecision accuracy on a gold standard set of segmented noun phrase queries, well above recently published approaches. Key enablers of this high performance are features derived from previous natural language processing work in noun compound bracketing. For example, token association features beyond simple N-gram counts provide powerful indicators of segmentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

The Role of Multi-word Units in Interactive Information Retrieval

The paper presents several techniques for selecting noun phrases for interactive query expansion following pseudo-relevance feedback and a new phrase search method. A combined syntactico-statistical method was used for the selection of phrases. First, noun phrases were selected using a part-ofspeech tagger and a noun-phrase chunker, and secondly, different statistical measures were applied to s...

متن کامل

Noun-Phrase Model and Natural Query Language

Basic considerations in designing a natural data base query language system are discussed. The notion of the noun-phrase data model is elaborated, and its role in making a query system suitable for general use is stressed. An experimental query system, Yachimata, embodying the concept, is described.

متن کامل

A Machine Learning Approach for QA and Novelty Tracks: NTT System Description

In one sense, the goals of QA and Novelty tasks are the same: extracting small document parts which are relevant to users’ queries. Additionally, the unit of extraction is almost always fixed in both tasks. For QA, an answer is a noun phrase in most cases, and for Novelty, a sentence is recognized as the basic information unit. This observation leads us to the following unified approach to both...

متن کامل

Compound Noun Segmentation Based on Lexical Data Extracted from Corpus

Compound noun analysis is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without white space in real texts, which makes it difficult to identify the morphological constituents. This paper presents an effective method of Korean compound noun segmen-tation based on lexical data extracted from corpus. The segmentation is done by two steps: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007