text documents classification

نتایج جستجو برای: text documents classification

تعداد نتایج: 694633 فیلتر نتایج به سال:

Classification of text documents supervised by domain ontologies

2013

Anna Rozeva

The research objective is to establish an approach for supporting the classification of text documents referring to a specified domain. The focus is on the preliminary topic assignment to the documents used for training the model. The method implements domain ontology as background knowledge. The idea consists in extracting the preliminary topics for training the classifier by means of unsuperv...

متن کامل

Arabic Text Categorization Using Classification Rule Mining

2012

Mofleh Al-diabat

Text categorization is one of the known problems in classification data mining. It aims to mapping text documents into one or more predefined class or category based on its contents of keywords. This problem has recently attracted many scholars in the data mining and machine learning communities since the numbers of online documents that hold useful information for decision makers, are numerous...

متن کامل

The Ontologymapper Plug-in: Supporting Semantic Annotation of Text-documents by Classification

2007

Peter Scheir Philip Hofmair Michael Granitzer Stefanie N. Lindstaedt

In this contribution we present a tool for annotating documents, which are used for workintegrated learning, with concepts from an ontology. To allow for annotating directly while creating or editing an ontology, the tool was realized as a plug-in for the ontology editor Protégé. Annotating documents with semantic metadata is a laborious task, most of the time knowledge representations are crea...

متن کامل

Pruning Training Corpus to Speedup Text Classification1

2002

Jihong Guan Shuigeng Zhou

With the rapid growth of online text information, efficient text classification has become one of the key techniques for organizing and processing text repositories. In this paper, an efficient text classification approach was proposed based on pruning training-corpus. By using the proposed approach, noisy and superfluous documents in training corpuses can be cut off drastically, which leads to...

متن کامل

A Comparative Study on Representation of Web Pages in Automatic Text Categorization

2006

Seyda Ertekin C. Lee Giles

With many web sites appearing everyday, it has become increasingly difficult to keep the web directories up-to-date and growing. The interest in the usage of machine learning on automatic text categorization is further stimulated with this intensive growth of World Wide Web. We believe that Web page classification is significantly different from a traditional text classification because of the ...

متن کامل

Czech Text Document Corpus v 2.0

Journal: :CoRR 2017

Pavel Král Ladislav Lenc

This paper introduces “Czech Text Document Corpus v 2.0”, a collection of text documents for automatic document classification in Czech language. It is composed of 11,955 text documents provided by the Czech News Agency and is freely available for research purposes at http://home.zcu.cz/ ̃pkral/sw/ . This corpus was created in order to facilitate a straightforward comparison of the document clas...

متن کامل

A Novel Approach in Feature Selection Method for Text Document Classification

2015

S. W. Mohod

In this paper, a novel approach is proposed for extract eminence features for classifier. Instead of traditional feature selection techniques used for text document classification. We introduce a new model based on probability and over all class frequency of term. We applied this new technique to extract features from training text documents to generate training set for machine learning. Using ...

متن کامل

Clasificación de textos adaptada para Conversión de Texto en Habla Multidominio

Journal: :Procesamiento del Lenguaje Natural 2006

Francesc Alías Xavi Gonzalvo Xavier Sevillano Joan Claudi Socoró José Antonio Montero David García

This paper introduces a text classification system tuned to cope with the requirements of multi-domain text-to-speech synthesis. This method, based on a previous system which represents texts by means of a weighted graph, has been developed to improve the classification efficiency for small texts and to minimize its computational cost. To that effect, the comparison space is built from the inpu...

متن کامل

Feature Reduction for High-Precision Text Classifi- cation

2011

Yi-Xian Lin Been-Chian Chien

Processing high dimensional features is the key of documents analysis and text classification. Traditional technologies for selecting or extracting rely heavily on the distribution of term features in the set of documents. It generally needs high computation cost to find the significant features. In this paper, we propose a new feature reduction method based on the analysis of discriminant coef...

متن کامل

Text and Hypertext Categorization

2009

Houda Benbrahim Max Bramer

Automatic categorization of text documents has become an important area of research in the last two decades, with features that make it significantly more difficult than the traditional classification tasks studied in machine learning. A more recent development is the need to classify hypertext documents, most notably web pages. These have features that add further complexity to the categorizat...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید