Supervised Learning for Automatic Classification of Documents using Self-Organizing Maps

نویسندگان

  • Dina Goren-Bar
  • Tsvi Kuflik
  • Dror Lev
چکیده

Automatic Document Classification that corresponds with user-predefined classes is a challenging and widely researched area. Self-Organizing Maps (SOM) are unsupervised Artificial Neural Networks (ANN) which are mathematically characterized by transforming high-dimensional data into two-dimension representation, enabling automatic clustering of the input, while preserving higher order topology. A closely related algorithm is the Learning Vector Quantization (LVQ), which uses supervised learning to maximize correct data classification. This study presents the application of SOM and LVQ to automatic document classification, based on predefined set of clusters. A set of documents, manually clustered by domain expert was used. Experimental results show considerable success of automatic document clustering that matches manual clustering, with a slight preference for the LVQ.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Learning for Web Text Clustering

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semi-supervised framework is a promising approach to reduc...

متن کامل

Automating Personal Categorization Using Artificial Neural Networks

Organizations as well as personal users invest a great deal of time in assigning documents they read or write to categories. Automatic document classification that matches user subjective classification is widely used, but much challenging research still remain to be done. The self-organizing map (SOM) is an artificial neural network (ANN) that is mathematically characterized by transforming hi...

متن کامل

Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods

This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-mea...

متن کامل

Som-based Clustering of Textual Documents Using Wordnet

The classification of textual documents has been the subject of many studies. Technologies like the web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches us...

متن کامل

Air Quality Modelling by Kohonen’s Self-organizing Feature Maps and LVQ Neural Networks

The paper presents a design of parameters for air quality modelling and the classification of districts into classes according to their pollution. Further, it presents a model design, data pre-processing, the designs of various structures of Kohonen’s Self-organizing Feature Maps (unsupervised methods), the clustering by K-means algorithm and the classification by Learning Vector Quantization n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000