Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task
نویسندگان
چکیده
Understanding intent underlying search query recently attracted enormous research interests. Two challenging issues are worth noting: First, words within query are usually ambiguous while query in most cases is too short to disambiguate. Second, ambiguity in some cases cannot be resolved according merely to the limited query context. It is thus demanded that the ambiguity be resolved/analyzed within context other than the query itself. This paper presents the intent mining system developed by THCIB and THUIS, which is capable of understanding English and Chinese query respectively, with four types of context: query, knowledge base, search results and user behavior statistics. The major contributions are summarized as follows: (1) Extracted from the query, concepts are used to extend the query; (2) Concepts are used to extract explicit subtopic candidates within Wikipedia. (3) LDA is applied to discover explicit subtopic candidates within search results. (4) Sense based subtopic clustering and entity analysis are conducted to cluster the subtopic candidates so as to discover the exclusive intents. (5) Intents are ranked with a unified intent ranking model. Experimental results indicate that our intent mining method is effective.
منابع مشابه
Overview of the NTCIR-12 IMine-2 Task
In this paper, we provide an overview of the NTCIR-12 IMine-2 task, which is a core task of NTCIR-12 and also a succeeding work of IMine@NTCIR-11, INTENT-2@NTCIR-10, and INTENT@NTCIR-9 tasks. IMine-2 comprises the Query Understanding subtask and the Vertical Incorporating subtask. 23 groups from diverse countries including China, France, India, Portugal, Ireland, and Japan registered to the tas...
متن کاملKDEIM at NTCIR-12 IMine-2 Search Intent Mining Task: Query Understanding through Diversified Ranking of Subtopics
In this paper, we describe our participation in the Query Understanding subtask of the NTCIR-12 IMINE Task. We propose a method that extracts subtopics by leveraging the query suggestions from search engines. The importance of the subtopics with the query is estimated by exploiting multiple query-dependent and query-independent features with supervised feature selection. To diversify the subtop...
متن کاملTHUIR at NTCIR-12 IMine Task
In this paper, we describe our approaches in the NTCIR12 IMine task, including Chinese Query Understanding and Chinese Vertical Incorporating. In Query Understanding subtask, we propose different strategies to mine subtopic candidates from a wide range of resources and present a twostep method to predict the vertical intent for each subtopic. In Vertical Incorporating subtask, we adopt a probab...
متن کاملMining Search Subtopics from Query Logs
Web queries are usually short and ambiguous. Subtopic mining plays an important role in understanding user’s search intent and has attracted many researchers' attention. In this paper, we describe our approach to identify users’ intents from query logs, which is a subtopic mining subtask of the NTCIR-9 Intent task for Chinese. We extract queries that are semantically related to the original que...
متن کاملTUTA1 at the NTCIR-12 Temporalia Task
Our group submitted task for Temporal Intent Disambiguation (TID) Subtask (Chinese) of NTCIR-2012. We using word2vec to model query String into feature vector, and using cos function to measure the similarity between query string and training corpus SougouCA. Our results shows the approach is efficient for solving thoes Task.
متن کامل