text documents

Information Retrieval in Distributed Hypertexts

1994

Paul De Bra Geert-Jan Houben Yoram Kornatzky Renier Post

Hypertext is a generalization of the conventional linear text into a non-linear text formed by adding cross-reference and structural links between different pieces of text. A hypertext can be regarded as an extension of a textual database by adding a link structure among the different text objects it stores. We present a tool for finding information in a distributed hypertext such as the World-...

متن کامل

Background Knowledge, Indexing and Matching Interdependencies of Document Management and Ontology-Maintenance

2000

Andreas Faatz Thomas Kamps Ralf Steinmetz

which determines similarities between text documents. These text documents are indexed with keywords and further background knowledge-terms from an ontology.The representation of the documents and the evaluation of the algorithm are used to let an ontology learn. This is shown to be one way of improving the results of the algorithm by improving the background knowledge.

متن کامل

Imaged Document Text Retrieval Without OCR

Journal: :IEEE Trans. Pattern Anal. Mach. Intell. 2002

Chew Lim Tan Weihua Huang Zhaohui Yu Yi Xu

ÐWe propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely, the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-gram based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calcul...

متن کامل

Pre Processing Techniques for Arabic Documents Clustering

2017

Mohammed Alhanjouri

Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: ...

متن کامل

Effective Term Based Text Clustering Algorithms

2010

P. Ponmuthuramalingam

Text clustering methods can be used to group large sets of text documents. Most of the text clustering methods do not address the problems of text clustering such as very high dimensionality of the data and understandability of the clustering descriptions. In this paper, a frequent term based approach of clustering has been introduced; it provides a natural way of reducing a large dimensionalit...

متن کامل

Unsupervised Text Annotation

2017

Tanya Braun Felix Kuhr Ralf Möller

We introduce the unsupervised text annotation model UTA, which iteratively populates a document-specific database containing the related symbolic content description. The model identifies the most related documents using the text of documents and the symbolic content description. UTA extends the database of one document with data from related documents without ignoring the precision.

متن کامل

Mining and its Application in Biomedical Domain

2006

Illhoi Yoo Xia Lin Bahrad A. Sokhansanj Don Goelman TaeWhan Jung YoungJae Jung

Semantic Text Mining and its Application in Biomedical Domain Illhoi Yoo Xiaohua Hu, Ph.D A huge amount of biomedical knowledge and novel discoveries have been produced and collected in text databases or digital libraries, such as MEDLINE, because the most natural form to store information is text. In order to cope with this pressing text information overload, text mining is employed. However, ...

متن کامل

Survey of Text Clustering

2005

Liping Jing

Clustering text documents into different category groups is an important step in indexing, retrieval, management and mining of abundant text data on the Web or in corporate information systems. Text clustering task can be intuitively described as finding, given a set vectors of some data points in a multi-dimensional space, a partition of text data into clusters such that the points within each...

متن کامل

AN EVALUATION OF RETRIEVAL EFFECTIVENESS FOR A FULL-TEXT DOCulwvT-l?ETl?lEviiL SYSTEM

1999

Edgar H. Sibley DAVID C. BLAIR M. E. MARON

Document retrieval is the problem of finding stored documents that contain useful information. There exist a set of documents on a range of topics, written by different authors, at different times, and at varying levels of depth, detail, clarity, and precision, and a set of individuals who, at different times and for different reasons, search for recorded information that may be contained in so...

متن کامل

An Algorithm for Reducing Text Line Candidates of Incorrect Orientation

1998

Hideaki Goto Hirotomo Aso

Japanese documents often contain both horizontally and vertically printed text lines in the same page. It has been required for document analysis systems to detect correct orientation of text lines and to select text line candidates of correct orientation. We designed an efficient framework for the procedure and developed some algorithms which reduce text line candidates of incorrect orientatio...

متن کامل