text documents

نتایج جستجو برای: text documents

تعداد نتایج: 222232 فیلتر نتایج به سال:

Text Categorization – A Review

2013

Rajni Jindal Shweta Taneja

With the growth of internet, the amount of digital information is growing exponentially day by day. This information may be structured or unstructured in nature. So, a need to convert unstructured text into structured text and to infer knowledge was felt As a result of this, the field of text mining emerged. Text documents may be in the form of online news articles, emails, scientific documents...

متن کامل

Clustering multilingual documents by estimating text - to - text semantic relatedness

2010

Dani Yogatama

This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...

متن کامل

Classifying Text Documents by Associating Terms With Text Categories

2002

Osmar R. Zaïane Maria-Luiza Antonie

متن کامل

Indexing Text Documents Based on Topic Identification

2004

Manonton Butarbutar Susan McRoy

This work provides algorithms and heuristics to index text documents by determining important topics in the documents. To index text documents, the work provides algorithms to generate topic candidates, determine their importance, detect similar and synonym topics, and to eliminate incoherent topics. The indexing algorithm uses topic frequency to determine the importance and the existence of th...

متن کامل

Study on Various Methods for Text Clustering

2013

R. RAJANI V. SIREESHA

Clustering text documents into different category groups is an important step in indexing, retrieval, management and mining of abundant text data on the Web or in corporate information systems. Text clustering task can be intuitively described as finding, given a set vectors of some data points in a multi-dimensional space, a partition of text data into clusters such that the points within each...

متن کامل

Text Mining in Biomedical Domain with Emphasis on Document Clustering

2017

Vinaitheerthan Renganathan

OBJECTIVES With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. METHODS This paper reviews text mining processes in detail and the software tools a...

متن کامل

Detecting Text Reuse with Modified and Weighted N-grams

2012

Rao Muhammad Adeel Nawab Mark Stevenson Paul D. Clough

Text reuse is common in many scenarios and documents are often based, at least in part, on existing documents. This paper reports an approach to detecting text reuse which identifies not only documents which have been reused verbatim but is also designed to identify cases of reuse when the original has been rewritten. The approach identifies reuse by comparing word n-grams in documents and modi...

متن کامل

Spoken document retrieval by translating recognition candidates into correct transcriptions

2008

Tomoyosi Akiba Yusuke Yokota

This paper proposes an ad hoc retrieval method for spoken documents that uses a statistical translation technique. After transcribing the spoken documents by using a Large-Vocabulary Continuous Speech Recognition (LVCSR) decoder, a text-based ad hoc retrieval method can be directly applied to the transcribed documents. However, recognition errors will signi cantly degrade the retrieval performa...

متن کامل

Optimization of Text Classification Using Supervised and Unsupervised Learning Approach

2015

Manpreet Kaur Vijay Kumar

Text Classification, also known as text categorization, is the task of automatically allocating unlabeled documents into predefined categories. Text Classification means allocating a document to one or more categories or classes. The ability to accurately perform a classification task depends on the representations of documents to be classified. Text representations transform the textural docum...

متن کامل

Towards Multilingual Information Discovery through a SOM based Text Mining approach

2000

Chung-Hong Lee Hsin-Chang Yang

Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus on processing monolingual documents (particularly English documents) only, little attention has been paid to apply the techniques to handle the document...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید