Towards a Topic Driven Access to Full Text Documents

نویسندگان

  • Caterina Caracciolo
  • Willem van Hage
  • Maarten de Rijke
چکیده

We address the issue of providing a topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used as a basis for the automatic generation of links, and as a visualization aid for the reader who is presented with a focused and restricted text snippet. In presence of a concept hierarchy (ontology), the information retrieval step would connect the obtained segments to concepts in the ontology. In this paper we concentrate on the text segmentation phase: we describe our apporach, discuss some related issues and report on preliminary results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Topic Driven Access to Full Text Documents

We address the issue of providing topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used in two ways: as a basis for automatically generating hypertext links, and as a visualization aid for the rea...

متن کامل

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Finding Topic-centric Identified Experts based on Full Text Analysis

This paper shows a method for finding topic-centric experts from open access metadata and full text documents. Topic-centric information including experts is served on OntoFrame, which is a Semantic Web-based academic research information service supporting R&D activities. URI schemebased OntoFrame provides three entity pages: topic, person, and event. ‘Persons by Topic’ in topic page lists up ...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

SAVVY SEARCHING Open access to scholarly full-text documents

Purpose – The purpose of this article is to discuss open access to scholarly full-text documents. Design/methodology/approach – Discusses open access to scholarly full-text documents. Findings – The paper shows that while open access archives are good for the majority, for publishers, editors and authors, open access articles can substantially increase their impact, and the impact factor for th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004