Extracting Keywords from Digital Document Collections

نویسنده

  • Anna Jonsson
چکیده

An indexing tool was built to provide for one of several information seeking tasks. In ac­ cordance with the basic principles of work held by the HUMLE laboratory at SICS, a so­ lution regarding indexing would be a semi-automatic tool. This approach is also relevant as the continuation of the indexing project is conducted in co-operation with the Swedish Parliament, where a staff of professional indexers currently is investigating the utility of automatic and semi-automatic indexing tools to raise productivity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Aletras, Nikolaos, Timothy Baldwin, Jey Han Lau and Mark Stevenson (to appear) Representing Topics Labels for Exploring Digital Libraries, In Proceedings of Digital Libraries 2014, London, UK

Topic models have been shown to be a useful way of representing the content of large document collections, for example via visualisation interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a set of keywords, i.e. the top-n words with highest marginal probability within the topic. However, alternativ...

متن کامل

Mining Technique Using Association Rules Extraction

automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions...

متن کامل

Mining Cross-document Relationships from Text

The paper argues that automatic link generation and typing methods are needed to find and maintain crossdocument links in large and growing textual collections. Such links are important to organise information and to support search and navigation. We present an experimental study on mining cross-document links from a collection of 5000 documents. We identify a set of link types and show that th...

متن کامل

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999