نتایج جستجو برای: text document classification
تعداد نتایج: 765658 فیلتر نتایج به سال:
text classification is an important research field in information retrieval and text mining. the main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. since word detection is a difficult and time consuming task in persian language, bayesian text classifier is an appropriate approach to deal with different...
Widespread digitization of information in today’s internet age has intensified the need for effective textual document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naïve Bayes (NB)...
The work presented in this paper is to evaluate the performance of Latent Semantic Analysis (LSA) model in capturing word correlations within text by including domain information in the process. The performance of the model is empirically evaluated by classification of Hindi text. The accuracies of classification are compared against plain LSA. An increase of 1.25% classification accuracy is ac...
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approaches, including traditional classifier ensembles. The method consists in combining a document categorization technique with a single classifier or ...
Document categorization is one of the foundational problems in (web) information retrieval. Even though web documents are hyperlinked, most proposed classification techniques take little advantage of the link structure and rely primarily on text features, as it is not immediately clear how to make link information intelligible to supervised machine learning algorithms. This paper introduces a l...
We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document repr...
feature extraction for text classification Göksel BİRİCİK∗, Banu DİRİ, Ahmet Coşkun SÖNMEZ Department of Computer Engineering, Yıldız Technical University, Esenler, İstanbul-TURKEY e-mails: {goksel,banu,acsonmez}@ce.yildiz.edu.tr Received: 03.02.2011 Abstract Feature selection and extraction are frequently used solutions to overcome the curse of dimensionality in text classification problems. W...
Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embedding...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید