text documents classification

نتایج جستجو برای: text documents classification

تعداد نتایج: 694633 فیلتر نتایج به سال:

Development of a patent document classification and search platform using a back-propagation network

Journal: :Expert Syst. Appl. 2006

Amy J. C. Trappey Fu-Chiang Hsu Charles V. Trappey Chia-I Lin

In order to process large numbers of explicit knowledge documents such as patents in an organized manner, automatic document categorization and search are required. In this paper, we develop a document classification and search methodology based on neural network technology that helps companies manage patent documents more effectively. The classification process begins by extracting key phrases...

متن کامل

Kohonen Networks with Graph-based Augmented Metrics

2005

Peter Andras Olusola Idowu

Correct and efficient text classification is a major challenge in today’s world of rapidly increasing amount of accessible electronic text data. Kohonen networks have been applied to document classification with comparable success to other document clustering methods. An important challenge is to devise text similarity metrics that can improve the performance of text classification Kohonen netw...

متن کامل

Arabic supervised learning method using N-gram

Journal: :Interact. Techn. Smart Edu. 2008

Majed Sanan Mahmoud Rammal Khaldoun Zreik

Purpose – Recently, classification of Arabic documents is a real problem for juridical centers. In this case, some of the Lebanese official journal documents are classified, and the center has to classify new documents based on these documents. This paper aims to study and explain the useful application of supervised learning method on Arabic texts using N-gram as an indexing method (n1⁄4 3). D...

متن کامل

2-Way Text Classification for Harmful Web Documents

2006

Youngsoo Kim Taekyong Nam Dongho Won

The openness of the Web allows any user to access almost any type of information. However, some information, such as adult content, is not appropriate for all users, notably children. Additionally for adults, some contents included in abnormal porn sites can do ordinary people’s mental health harm. In this paper, we propose an efficient 2-way text filter for blocking harmful web documents and a...

متن کامل

Semi-Structured Document Classification

2009

Ludovic Denoyer

INTRODUCTION Document classification developed over the last ten years, using techniques originating from the pattern recognition and machine learning communities. All these methods do operate on flat text representations where word occurrences are considered independents. The recent paper (Sebastiani, 2002) gives a very good survey on textual document classification. With the development of st...

متن کامل

Text Classification from Positive and Unlabeled Documents Based on GA

2006

Tao Peng Fengling He Wanli Zuo

Automatic text classification is one of the most important tools in Information Retrieval. As the traditional methods for text classification cannot find the best feature set, the GA is applied to the feature selection because it can get the global optimal solution. This paper presents a novel text classifier from positive and unlabeled documents based on GA. Firstly, we identify reliable negat...

متن کامل

Concept of Status Matrix in Classification of Text Documents

2009

R. Dinesh B. S. Harish D. S. Guru S. Manjunath

In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of tex...

متن کامل

Representation Quality in Text Classification: An Introduction and Experiment

1990

David D. Lewis

The way in which text is represented has a strong impact on the performance of text classification (retrieval and categorization) systems. We discuss the operation of text classification systems, introduce a theoretical model of how text representation impacts their performance, and describe how the performance of text classification systems is evaluated. We then present the results of an exper...

متن کامل

Text Classification from Labeled and Unlabeled Documents Using

1998

William W. Cohen

This paper shows that the accuracy of learned text classi ers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classi cation problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from lab...

متن کامل

An Effective General Purpose Approach for Biomedical Document Classification

2006

Aaron M. Cohen

Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are high...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید