How to improve robustness in Kohonen maps and visualization in Factorial Analysis: application to text mining

نویسندگان

  • Nicolas Bourgeois
  • Marie Cottrell
  • Benjamin Déruelle
  • Stéphane Lamassé
  • Patrick Letrémy
چکیده

This article is an extended version of a paper presented in the WSOM’2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: Application to text mining

This article is an extended version of a paper presented in the WSOM’2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factori...

متن کامل

Lexical Recount between Factor Analysis and Kohonen Map: Mathematical Vocabulary of Arithmetic in the Vernacular Language of the Late Middle Ages

In this paper we present a combination of factorial projections and of SOM algorithm applied to a text mining problem. The corpus consists of 8 medieval texts which were used to teach arithmetic techniques to merchants. Classical Factorial Component Analysis (FCA) gives nice representations of the selected words in association with the texts, but the quality of the representation is poor in the...

متن کامل

Special Issue on Advances in Self-organizing Maps

It has been 17 years since the first Workshop on Self-organizing Maps (WSOM) was held in Helsinki, Finland in 1997, under the leadership of Teuvo Kohonen. The workshop brings together researchers and practitioners in the field of self-organizing systems and related areas. The 9th WSOMwas held for the first time in LatinAmerica, at the Universidad de Chile, Santiago, Chile, on December 2012. Thi...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

Data Mining Using Self-Organizing Kohonen Maps: A Technique for Effective Data Clustering & Visualization

Exploratory data mining using artificial neural networks offers an alternative dimension to data mining, in particular techniques geared towards data clustering and classification. In this paper, we argue the case for using neural networks as a viable data mining tool that can provide statistical insights and models from large data-sets. We demonstrate how Self-Organizing Kohonen Maps, an unsup...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015