Parallel Web Text Clustering with a Modular Self-Organizing Map System

نویسندگان

  • Lean YU
  • Shouyang WANG
  • Kin Keung LAI
چکیده

In this study, a multistage modular self-organizing map (SOM) model is proposed for parallel web text clustering. In the first stage, the large textual datasets are divided into some small disjoint datasets (i.e., task decomposition). In the second stage, each small data set is input into different unitary SOM models for word clustering map (i.e., modularization learning). In this stage, different SOM models are implemented in a parallel way to gain greater computational efficiency and scalability. In the third stage, based upon the outputs of each SOM module in the previous stage, another SOM model is used to integrate different word clustering results to formulate a text category map (i.e., module fusion). In the proposed model, word clustering map is embedded into text category map and thus a hierarchically modular SOM model is formulated. For illustration and verification purpose, a practical text clustering experiment is performed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-Organizing-Map-Based Metamodeling for Massive Text Data Exploration

In this study, we describe the use of the self-organizing map (SOM) as a metamodeling technique to design a parallel text data exploration system. Firstly, the large textual collections are divided into various small data subsets. Based on the different subsets, different unitary SOM models, i.e., base models, are then trained for word clustering map. In this phase, different SOM models are imp...

متن کامل

NGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map

Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...

متن کامل

Uncertainty Modeling of a Group Tourism Recommendation System Based on Pearson Similarity Criteria, Bayesian Network and Self-Organizing Map Clustering Algorithm

Group tourism is one of the most important tasks in tourist recommender systems. These systems, despite of the potential contradictions among the group's tastes, seek to provide joint suggestions to all members of the group, and propose recommendations that would allow the satisfaction of a group of users rather than individual user satisfaction. Another issue that has received less attention i...

متن کامل

Gait Based Vertical Ground Reaction Force Analysis for Parkinson’s Disease Diagnosis Using Self Organizing Map

The aim of this work is to use Self Organizing Map (SOM) for clustering of locomotion kinetic characteristics in normal and Parkinson’s disease. The classification and analysis of the kinematic characteristics of human locomotion has been greatly increased by the use of artificial neural networks in recent years. The proposed methodology aims at overcoming the constraints of traditional analysi...

متن کامل

Self-organizing maps for latent semantic analysis of free-form text in support of public policy analysis

The huge amount of free-form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007