Self-Organizing Map in Data-Analysis - Notes on Overfitting and Overinterpretation
نویسندگان
چکیده
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searching for dependencies in the data. We point out that simple use of the SOM may lead to excessive number of false hypotheses. We formulate the exact probability density model for which the SOM training gives the Maximum Likelihood estimate and show how the model parameters (neighborhood and kernel width) can be chosen to avoid overfitting. The conditional distributions from the true density model offer a consistent way to quantify and test the dependencies between variables.
منابع مشابه
Self-Organizing Maps in data analysis - notes on overfitting and overinterpretation
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searc hing for dependencies in the data. We poin t out that simple use of the SOM may lead to excessive number ...
متن کاملClassification of Streaming Fuzzy DEA Using Self-Organizing Map
The classification of fuzzy data is considered as the most challenging areas of data analysis and the complexity of the procedures has been obstacle to the development of new methods for fuzzy data analysis. However, there are significant advances in modeling systems in which fuzzy data are available in the field of mathematical programming. In order to exploit the results of the researches on ...
متن کاملLandforms identification using neural network-self organizing map and SRTM data
During an 11 days mission in February 2000 the Shuttle Radar Topography Mission (SRTM) collected data over 80% of the Earth's land surface, for all areas between 60 degrees N and 56 degrees S latitude. Since SRTM data became available, many studies utilized them for application in topography and morphometric landscape analysis. Exploiting SRTM data for recognition and extraction of topographic ...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملOvertraining and model selection with the self-organizing map
We discuss the importance of finding the correct model complexity, or regularization level, in the self-organizing map (SOM) algorithm. The complexity of the SOM is determined mainly by the width of the final neighborhood, which is usually chosen ad hoc or set to zero for optimal quantization error. However, if the SOM is used for visualizing the joint probability distribution of the data, then...
متن کامل