Self-organizing systems for knowledge discovery in large databases

نویسندگان

  • William H. Hsu
  • Loretta S. Anvil
  • William M. Pottenger
  • David Tcheng
  • Michael Welge
چکیده

We present a framework in which self-organizing systems can be used to perform change of representation on knowledge discovery problems, to learn from very large databases. Clustering using self-organizing maps is applied to produce multiple, intermediate training targets that are used to define a new supervised learning and mixture estimation problem. The input data is partitioned using a state space search over subdivisions of attributes, to which self-organizing maps are applied to the input data as restricted to a subset of input attributes. This approach yields the variance-reducing benefits of techniques such as stacked generalization, but uses self-organizing systems to discover factorial (modular) structure among abstract learning targets. This research demonstrates the feasibility of applying such structure in very large databases to build a mixture of ANNs for data mining and KDD. Areas of applications include multi-attribute risk assessment using insurance policy data, text document categorization, and anomaly detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Technique for Pattern Extraction in Mixed Data

Knowledge discovery in databases or data mining is an important issue in the development of data and knowledge base system. The Self Organizing Map (SOM) is a vector quantization method which places the prototype vectors on a regular lowdimensional grid in an ordered fashion. Clustering data and extracting patterns from the clusters are very important tasks in data mining. An attribute-oriented...

متن کامل

A Multistrategy Learning Approach to Flexible Knowledge Organization and Discovery

1 Also with Lockheed Martin Federal Systems, Gaithersburg, MD. 2 Also with Science Applications International Corp., Tysons Corner, VA. Abstract Properly organizing knowledge so that it can be managed often requires the acquisition of patterns and relations from large, distributed, heterogeneous databases. The employment of an intelligent and automated KDD (Knowledge Discovery in Databases) pro...

متن کامل

بررسی کاربردهای داده کاوی در نظام سلامت

Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...

متن کامل

A Modfied Self-organizing Map Neural Network to Recognize Multi-font Printed Persian Numerals (RESEARCH NOTE)

This paper proposes a new method to distinguish the printed digits, regardless of font and size, using neural networks.Unlike our proposed method, existing neural network based techniques are only able to recognize the trained fonts. These methods need a large database containing digits in various fonts. New fonts are often introduced to the public, which may not be truly recognized by the Opti...

متن کامل

Application of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)

Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999