text document classification

نتایج جستجو برای: text document classification

تعداد نتایج: 765658 فیلتر نتایج به سال:

Clustering with Side Information for Mining Text Data

2015

Vijayalakshmi P

Side information is available along with text document in several text mining application. They are the different kind of side information such as document provenance information, the link in the document, other non textual attributes which are contained into the document or user access behavior from web logs. Some attributes may contain extremely large amount of information for clustering purp...

متن کامل

Short Text Document Clustering using Distributed Word Representation and Document Distance

Journal: :Walailak Journal of Science and Technology (WJST) 2018

متن کامل

From Word Embeddings To Document Distances

2015

Matt J. Kusner Yu Sun Nicholas I. Kolkin Kilian Q. Weinberger

We present the Word Mover’s Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local cooccurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document nee...

متن کامل

Document Vector Space Representation Model for Automatic Text Classification

2013

Ajit Danti Bharath Bhushan

Classification of text documents presents a unique challenge to conventional classification algorithms. Due to the existence of large number of features in the datasets, providing a desired representation for text documents can be seen as another problem. In this paper a simple but effective representation model for text documents to tackle the classification problem is discussed. Two different...

متن کامل

Multi-Script Line Identification System for Indian Languages

2010

Prakash K. Aithal Rajesh Gopakumar Dinesh U. Acharya

India is a multilingual multi-script country. There are totally 18 official languages and 12 scripts in India. For Optical Character Recognition (OCR) of such a multi-lingual document, it is necessary to identify the script before feeding the text lines to the OCRs of individual scripts. In this paper, a simple and efficient technique of script identification for Kannada, Malayalam, Telugu, Tam...

متن کامل

Poisson naive Bayes for text classification with feature weighting

2003

Sang-Bum Kim Hee-Cheol Seo Hae-Chang Rim

In this paper, we investigate the use of multivariate Poisson model and feature weighting to learn naive Bayes text classifier. Our new naive Bayes text classification model assumes that a document is generated by a multivariate Poisson model while the previous works consider a document as a vector of binary term features based on the presence or absence of each term. We also explore the use of...

متن کامل

Distributed Document Sharing with Text Classification over Content-Addressable Network

2004

Tayfun Elmas Öznur Özkasap

Content-addressable network is a scalable and robust distributed hash table providing distributed applications to store and retrieve information in an efficient manner. We consider design and implementation issues of a document sharing system over a content-addressable overlay network. Improvements and their applicability on a document sharing system are discussed. We describe our system protot...

متن کامل

Text Classification with the Combination of Feature Selection and Machine Learning Algorithm

2011

N. Swarna Jyothi M. Sailaja

Text classification refers to determine the class of an unknown text according to its content in the given classification system. In this paper the enhanced features are used to find distribution of a word in a single document or multiple number of documents. It can be exploited by a TF-IDF style equation, and different features are combined using ensemble learning techniques. Features are not ...

متن کامل

Enhancement in Data Mining Technique for Scattered Document Using Clustering

2016

Clustering is a widely studied data mining problem in the text documents. The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this paper, we will provide a detailed survey of the problem of text clustering. We will study the key challenges of the clustering problem, as it applies to the...

متن کامل

Questionnaire Free Text Summarisation Using Hierarchical Classification

2012

Matias Garcia-Constantino Frans Coenen P.-J. Noble Alan Radford

This paper presents an investigation into the summarisation of the free text element of questionnaire data using hierarchical text classification. The process makes the assumption that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. A hierarchical classification approach is ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید