Investigations in Unsupervised Back-of-the-Book Indexing

نویسندگان

  • Andras Csomai
  • Rada Mihalcea
چکیده

This paper describes our experiments with unsupervised methods for back-of-the-book index construction. Through comparative evaluations performed on a gold standard data set of 29 books and their corresponding indexes, we draw conclusions as to what are the most accurate unsupervised methods for automatic index construction. We show that if the right sequence of methods and heuristics is used, the performance of an unsupervised back-of-the-book index construction system can be raised with up to 250% relative increase in F-measure as compared to the performance of a system based on the traditional tf*idf weighting scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Extent of Correspondence between the Persian Book Final Indices and ISO 999 and B.S. 3700 Standards: The Case of the Field of Library and Information Sciences

Background and Aim: The purpose of this study was to investigate the extent of observing the standards of indexing (ISO 999-1996, BS 3700) of Library and Information Sciences books. Method: The study used descriptive-analytical methodology and the population consisted of all the Persian books, written and translated, in the field of Library and Information Sciences published from 2006 to 2012 w...

متن کامل

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

Syntactic Approaches to Automatic Book Indexing

Automatic book indexing systems are based on the generation of phrase structures capable of reflecting text content. • Some approaches are given for the automatic construction of back-of-book indexes using a syntactic analysis of the available texts, followed by the identification of nominal constructions, the assignment of importance weights to the term phrases, and the choice of phrases as in...

متن کامل

Integrating knowledge from different sources for automatic back-of-the-book indexing

The paper reports research on automatic back-of-the-book indexing. It presents a methodology which brings together knowledge from different disciplines. It is inspired by human indexing methodology and the results are more similar to manually-crafted indexes than those produced by previous automatic approaches. Issues of evaluation and applications are addressed. Résumé : Cette communication pr...

متن کامل

Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing

In this paper we present a supervised method for back-of-the-book index construction. We introduce a novel set of features that goes beyond the typical frequency-based analysis, including features based on discourse comprehension, syntactic patterns, and information drawn from an online encyclopedia. In experiments carried out on a book collection, the method was found to lead to an improvement...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007