Click-words: learning to predict document keywords from a user perspective
نویسندگان
چکیده
MOTIVATION Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords. RESULTS We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency-inverse document frequency (TF-IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF-IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks.
منابع مشابه
RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملPersonalizing Web Search Results Based on Subspace Projection
Personalized search has recently attracted increasing attention. This paper focuses on utilizing click-through data to personalize the web search results, from a novel perspective based on subspace projection. Specifically, we represent a user profile as a vector subspace spanned by a basis generated from a word-correlation matrix, which is able to capture the dependencies between words in the ...
متن کاملImproving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کاملWeb pages ranking algorithm based on reinforcement learning and user feedback
The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...
متن کامل