Selecting Effective Features and Relations for Efficient Multi-Relational Classification
نویسندگان
چکیده
Feature selection is an essential data processing step to remove irrelevant and redundant attributes for shorter learning time, better accuracy, and better comprehensibility. A number of algorithms have been proposed in both data mining and machine learning areas. These algorithms are usually used in a single table environment, where data are stored in one relational table or one flat file. They are not suitable for a multi-relational environment, where data are stored in multiple tables joined to one another by semantic relationships. To address this problem, in this article, we propose a novel approach called FARS to conduct both Feature And Relation Selection for efficient multi-relational classification. Through this approach, we not only extend the traditional feature selection method to select relevant features from multi-relations, but also develop a new method to reconstruct the multi-relational database schema and eliminate irrelevant tables to improve classification performance further. The results of the experiments conducted on both real and synthetic databases show that FARS can effectively choose a small set of relevant features, thereby enhancing classification efficiency and prediction accuracy significantly.
منابع مشابه
Determining the effective features in classification of heart sounds using trained intelligent network and genetic algorithm
Heart diseases are among the most important causes of mortality in the world, especially in industrial countries. Using heart sounds and the features extracted from them are among the non-aggressive diagnosis and prognosis methods for heart diseases. In this study, the time-scale, Cepstral, frequency, temporal and turbulence features are saved and extracted from the heart sounds, and then they ...
متن کاملEfficient Multi-relational Classification by Tuple ID Propagation
Most of today’s structured data is stored in relational databases. In contrast, most classification approaches only apply on single “flat” data relations. And it is usually difficult to convert multiple relations into a single flat relation without losing essential information. Inductive Logic Programming approaches have proven effective with high accuracy in multi-relational classification. Un...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملPredicting cardiac arrhythmia on ECG signal using an ensemble of optimal multicore support vector machines
The use of artificial intelligence in the process of diagnosing heart disease has been considered by researchers for many years. In this paper, an efficient method for selecting appropriate features extracted from electrocardiogram (ECG) signals, based on a genetic algorithm for use in an ensemble multi-kernel support vector machine classifiers, each of which is based on an optimized genetic al...
متن کاملEfficient Heterogeneous Multi-relational Classification Using Multi-criteria Ranking Approach Based on Characteristics of Multiple Relations
Traditional data mining algorithms will not work efficiently for most of the real world applications where the data is stored in relational format. Even well-known traditional classification technique such as J48, Naïve Bayes often suffers from poor scalability and unsatisfactory predictive performance when it comes to working with relational data. Moreover the performance of existing relationa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Intelligence
دوره 26 شماره
صفحات -
تاریخ انتشار 2010