Semi Supervised Document Classification Model Using Artificial Neural Networks

نویسنده

  • Dr. M. Karthikeyan
چکیده

Automatic document classification is of paramount importance to knowledge management in the information age. Document classification is a kind of text data mining and organization technique that automatically groups related documents into clusters. Most of the common techniques in document classification are based on the statistical analysis of a term, either word or phrase. Statistical analysis of a term frequency captures the importance of the term within the document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. To solve this problem the proposed system concentrates on an interactive text clustering methodology, semi supervised document classification method using neural networks. There are two main phases in the proposed method: Pre-processing phase and Classification phase. In the pre-processing phase, distinct words are identified and their frequency of occurrences in the document corpus is calculated. These discovered distinct words with their frequency of occurrences, form a document vector. In the classification phase, Back propagation algorithm is used for document classification by using the feature vector of distinct words. The proposed method evaluates the system efficiency by implementing and testing the clustering results with Dbscan and Kmeans clustering algorithms. Experiment shows that the proposed document clustering method performs with an average efficiency of 92% for various document categories..

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding

This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization. Unlike the previous approaches that rely on word embeddings, our method learns embeddings of small text regions from unlabeled data for integration into a supervised CNN. The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which...

متن کامل

Semi-Supervised Learning with Multi-View Embedding: Theory and Application with Convolutional Neural Networks

This paper presents a theoretical analysis of multi-view embedding – feature embedding that can be learned from unlabeled data through the task of predicting one view from another. We prove its usefulness in supervised learning under certain conditions. The result explains the effectiveness of some existing methods such as word embedding. Based on this theory, we propose a new semi-supervised l...

متن کامل

Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks

Current Adverse Drug Events (ADE) surveillance systems are often associated with a sizable time lag before such events are published. Online social media such as Twitter could describe adverse drug events in real-time, prior to official reporting. Deep learning has significantly improved text classification performance in recent years and can potentially enhance ADE classification in tweets. Ho...

متن کامل

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

In this paper, we propose a semi-supervised learning method where we train two neural networks in a multi-task fashion: a target network and a confidence network. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to weight the gradient updates to the target network using the scores provided by the sec...

متن کامل

Monthly runoff forecasting by means of artificial neural networks (ANNs)

Over the last decade or so, artificial neural networks (ANNs) have become one of the most promising tools formodelling hydrological processes such as rainfall runoff processes. However, the employment of a single model doesnot seem to be an appropriate approach for modelling such a complex, nonlinear, and discontinuous process thatvaries in space and time. For this reason, this study aims at de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016