Learning a Replacement Model for Query Segmentation with Consistency in Search Logs
نویسندگان
چکیده
Query segmentation is to split a query into a sequence of non-overlapping segments that completely cover all tokens in the query. The majority of methods are unsupervised, however, they are usually not as accurate as supervised methods due to the lack of guidance from labeled data. In this paper, we propose a new paradigm of learning a replacement model with consistency (LRMC), to enable unsupervised training with guidance from search log data. In LRMC, we first assume the existence of a base segmenter (an implementation of any existing approach). Then, we utilize a key observation that queries with a similar intent tend to have consistent segmentations, to automatically collect a set of labeled data from the outputs of the base segmenter by leveraging search log data. Finally, we employ the auto-collected data to train a replacement model for selecting the correct segmentation of a new query from the outputs of the base segmenter. The results show LRMC can improve state-of-the-art methods by an F-Score of around 7%.
منابع مشابه
Minimally Supervised Learning of Semantic Knowledge from Query Logs
We propose a method for learning semantic categories of words with minimal supervision from web search query logs. Our method is based on the Espresso algorithm (Pantel and Pennacchiotti, 2006) for extracting binary lexical relations, but makes important modifications to handle query log data for the task of acquiring semantic categories. We present experimental results comparing our method wit...
متن کاملTowards Semantic Query Segmentation
Query Segmentation is one of the critical components for understanding users’ search intent in Information Retrieval tasks. It involves grouping tokens in the search query into meaningful phrases which help downstream tasks like search relevance and query understanding. In this paper, we propose a novel approach to segment user queries using distributed query embeddings. Our key contribution is...
متن کاملQuery Segmentation for Web Search
This paper describes a query segmentation method for search engines supporting inverse lookup of words and phrases. Data mining in query logs and document corpora is used to produce segment candidates and compute connexity measures. Candidates are considered in context of the whole query, and a list of the most likely segmentations is generated, with each segment attributed with a connexity val...
متن کاملSearch for the Pharmacophore of Histone Deacetylase Inhibitors Using Pharmacophore Query and Docking Study
Histone deacetylase inhibitors have gained a great deal of attention recently for the treatment of cancers and inflammatory diseases. So design of new inhibitors is of great importance in pharmaceutical industries and labs. Creating pharmacophor models in order to design new molecules or search a library for finding lead compounds is of great interest. This approach reduces the overall cost ass...
متن کاملA Study of Machine Learning Models in Epidemic Surveillance: Using the Query Logs of Search Engines
Epidemics inevitably result in a large number of deaths and always cause considerable social and economic damage. Epidemic surveillance has thus become an important healthcare research issue. In 2009, Ginsberg et al. observed that the query logs of search engines can be used to estimate the status of epidemics in a timely manner. In this paper, we model epidemic surveillance as a classification...
متن کامل