نتایج جستجو برای: text documents classification

تعداد نتایج: 694633  

2013
Anna Rozeva

The research objective is to establish an approach for supporting the classification of text documents referring to a specified domain. The focus is on the preliminary topic assignment to the documents used for training the model. The method implements domain ontology as background knowledge. The idea consists in extracting the preliminary topics for training the classifier by means of unsuperv...

2012
Mofleh Al-diabat

Text categorization is one of the known problems in classification data mining. It aims to mapping text documents into one or more predefined class or category based on its contents of keywords. This problem has recently attracted many scholars in the data mining and machine learning communities since the numbers of online documents that hold useful information for decision makers, are numerous...

2007
Peter Scheir Philip Hofmair Michael Granitzer Stefanie N. Lindstaedt

In this contribution we present a tool for annotating documents, which are used for workintegrated learning, with concepts from an ontology. To allow for annotating directly while creating or editing an ontology, the tool was realized as a plug-in for the ontology editor Protégé. Annotating documents with semantic metadata is a laborious task, most of the time knowledge representations are crea...

2002
Jihong Guan Shuigeng Zhou

With the rapid growth of online text information, efficient text classification has become one of the key techniques for organizing and processing text repositories. In this paper, an efficient text classification approach was proposed based on pruning training-corpus. By using the proposed approach, noisy and superfluous documents in training corpuses can be cut off drastically, which leads to...

2006
Seyda Ertekin C. Lee Giles

With many web sites appearing everyday, it has become increasingly difficult to keep the web directories up-to-date and growing. The interest in the usage of machine learning on automatic text categorization is further stimulated with this intensive growth of World Wide Web. We believe that Web page classification is significantly different from a traditional text classification because of the ...

Journal: :CoRR 2017
Pavel Král Ladislav Lenc

This paper introduces “Czech Text Document Corpus v 2.0”, a collection of text documents for automatic document classification in Czech language. It is composed of 11,955 text documents provided by the Czech News Agency and is freely available for research purposes at http://home.zcu.cz/ ̃pkral/sw/ . This corpus was created in order to facilitate a straightforward comparison of the document clas...

2015
S. W. Mohod

In this paper, a novel approach is proposed for extract eminence features for classifier. Instead of traditional feature selection techniques used for text document classification. We introduce a new model based on probability and over all class frequency of term. We applied this new technique to extract features from training text documents to generate training set for machine learning. Using ...

Journal: :Procesamiento del Lenguaje Natural 2006
Francesc Alías Xavi Gonzalvo Xavier Sevillano Joan Claudi Socoró José Antonio Montero David García

This paper introduces a text classification system tuned to cope with the requirements of multi-domain text-to-speech synthesis. This method, based on a previous system which represents texts by means of a weighted graph, has been developed to improve the classification efficiency for small texts and to minimize its computational cost. To that effect, the comparison space is built from the inpu...

2011
Yi-Xian Lin Been-Chian Chien

Processing high dimensional features is the key of documents analysis and text classification. Traditional technologies for selecting or extracting rely heavily on the distribution of term features in the set of documents. It generally needs high computation cost to find the significant features. In this paper, we propose a new feature reduction method based on the analysis of discriminant coef...

2009
Houda Benbrahim Max Bramer

Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorizat...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید