Improving self-organization of document collections by semantic mapping
نویسندگان
چکیده
In text management tasks, the dimensionality reduction becomes necessary to computation and interpretability of the results generated by machine learning algorithms. This paper describes a feature extraction method called semantic mapping. Semantic mapping, sparse random mapping and PCA are applied to self-organization of document collections using self-organizing map (SOM). The behaviors of the methods on projection of binary and tfidf document vector representations are compared. The classification error generated by SOM maps on text categorization of the K1 collection was used to compare the performance of the methods. Semantic mapping generated better document representation than sparse random mapping. r 2006 Elsevier B.V. All rights reserved.
منابع مشابه
Self-organization of Very Large Document Collections: State of the Art
The Self-Organizing Map (SOM) forms a nonlinear projection from a high-dimensional data manifold onto a low-dimensional grid. A representative model of some subset of data is associated with each grid point. The SOM algorithm computes an optimal collection of models that approximates the data in the sense of some error criterion and also takes into account the similarity relations of the models...
متن کاملMapping discursive dynamics of the financial crisis: a structural perspective of concept roles in semantic networks
Background/purpose: Convenient access to vast and untapped collections of documents generated by organizations is a highly valuable resource for research. These documents (e.g., press releases) are a window into organizational strategies, communication patterns, and organizational behavior. However, the analysis of large document corpora requires appropriate automated methods for text mining an...
متن کاملImproving CNG Fuel Use in Transportation Sector: Strategic Option Development Approach
Energy is one of the main pillars of the economic cycle. The environmental pollution caused by the consumption of gasoline and diesel fuel, the problems and limitations of supplying and supplying fuel within the country, moving towards self-sufficiency in the supply of gasoline and diesel fuel, reducing government spending against income and also allocating macro subsidies to keep their prices...
متن کاملImproving CNG Fuel Use in Transportation Sector: Strategic Option Development Approach
Energy is one of the main pillars of the economic cycle. The environmental pollution caused by the consumption of gasoline and diesel fuel, the problems and limitations of supplying and supplying fuel within the country, moving towards self-sufficiency in the supply of gasoline and diesel fuel, reducing government spending against income and also allocating macro subsidies to keep their prices...
متن کاملExploration of Text Collections with Hierarchical Feature
Document classiication is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classiication techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classiicat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 70 شماره
صفحات -
تاریخ انتشار 2006