نتایج جستجو برای: text document classification

تعداد نتایج: 765658  

Journal: :international journal of information, security and systems management 0

text classification is an important research field in information retrieval and text mining. the main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. since word detection is a difficult and time consuming task in persian language, bayesian text classifier is an appropriate approach to deal with different...

Journal: :Expert Syst. Appl. 2012
M. Ghiassi M. Olschimke Brian Moon P. Arnaudo

Widespread digitization of information in today’s internet age has intensified the need for effective textual document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naïve Bayes (NB)...

2015
Karthik Krishnamurthi Vijayapal Reddy Panuganti Vishnu Vardhan Bulusu

The work presented in this paper is to evaluate the performance of Latent Semantic Analysis (LSA) model in capturing word correlations within text by including domain information in the process. The performance of the model is empirically evaluated by classification of Hindi text. The accuracies of classification are compared against plain LSA. An increase of 1.25% classification accuracy is ac...

Journal: :CoRR 2017
Piotr Borkowski Krzysztof Ciesielski Mieczyslaw A. Klopotek

In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approaches, including traditional classifier ensembles. The method consists in combining a document categorization technique with a single classifier or ...

2006
Zoltán Gyöngyi Hector Garcia-Molina Jan Pedersen

Document categorization is one of the foundational problems in (web) information retrieval. Even though web documents are hyperlinked, most proposed classification techniques take little advantage of the link structure and rely primarily on text features, as it is not immediately clear how to make link information intelligible to supervised machine learning algorithms. This paper introduces a l...

2017
Paul Michel Abhilasha Ravichander Shruti Rijhwani

We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document repr...

2012
Göksel BİRİCİK Banu DİRİ Ahmet Coşkun

feature extraction for text classification Göksel BİRİCİK∗, Banu DİRİ, Ahmet Coşkun SÖNMEZ Department of Computer Engineering, Yıldız Technical University, Esenler, İstanbul-TURKEY e-mails: {goksel,banu,acsonmez}@ce.yildiz.edu.tr Received: 03.02.2011 Abstract Feature selection and extraction are frequently used solutions to overcome the curse of dimensionality in text classification problems. W...

2015
Yang Liu Zhiyuan Liu Tat-Seng Chua Maosong Sun

Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embedding...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید