selection data mining

Feature Selection via Correlation Coefficient Clustering

Journal: :JSW 2010

Hui-Huang Hsu Cheng-Wei Hsieh

Feature selection is a fundamental problem in machine learning and data mining. How to choose the most problem-related features from a set of collected features is essential. In this paper, a novel method using correlation coefficient clustering in removing similar/redundant features is proposed. The collected features are grouped into clusters by measuring their correlation coefficient values....

متن کامل

Towards Online Concept Drift Detection with Feature Selection for Data Stream Classification

2016

Mahmood Hammoodi Frederic T. Stahl Mark Tennant

Data Streams are unbounded, sequential data instances that are generated very rapidly. The storage, querying and mining of such rapid flows of data is computationally very challenging. Data Stream Mining (DSM) is concerned with the mining of such data streams in real-time using techniques that require only one pass through the data. DSM techniques need to be adaptive to reflect changes of the p...

متن کامل

Consistency Based Feature Selection

2000

Manoranjan Dash Huan Liu Hiroshi Motoda

Feature selection is an e ective technique in dealing with dimensionality reduction for classi cation task a main component of data mining It searches for an optimal subset of features The search strategies under consideration are one of the three complete heuristic and probabilistic Existing algorithms adopt various measures to evaluate the goodness of feature subsets This work focuses on one ...

متن کامل

Discovering Task Neighbourhoods Through Landmark Learning Performances

2000

Hilan Bensusan Christophe G. Giraud-Carrier

Arguably, model selection is one of the major obstacles, and a key once solved, to the widespread use of machine learning/data mining technology in business. Landmarking is a novel and promising metalearning approach to model selection. It uses accuracy estimates from simple and eÆcient learners to describe tasks and subsequently construct meta-classi ers that predict which one of a set of more...

متن کامل

Scalable Feature Mining for Sequential Data

Journal: :IEEE Intelligent Systems 2000

Neal Lesh Mohammed J. Zaki Mitsunori Ogihara

Classification algorithms are difficult to apply to sequential examples, such as text or DNA sequences, because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space of all subsets of the available features which is intractable for large feature sets. We adapt data mining techniques to act as a prepro...

متن کامل

Practice-based Evidence in Medicine: Where Information Retrieval Meets Data Mining

2014

Karin M. Verspoor

A new approach in medical practice is emerging thanks to the increasing availability of large-scale clinical data in electronic form. In practice-based evidence [5, 6], the clinical record is mined to identify patterns of health characteristics, such as diseases that co-occur, side-effects of treatments, or more subtle combinations of patient attributes that might explain a particular health ou...

متن کامل

REEF: Resolving Length Bias in Frequent Sequence Mining

2013

Ariella Richardson Gal A. Kaminka Sarit Kraus

Classic support based approaches efficiently address frequent sequence mining. However, support based mining has been shown to suffer from a bias towards short sequences. In this paper, we propose a method to resolve this bias when mining the most frequent sequences. In order to resolve the length bias we define norm-frequency, based on the statistical zscore of support, and use it to replace s...

متن کامل

Combining multiple classifiers for wrapper feature selection

Journal: :IJDMMM 2008

Kyriacos Chrysostomou Sherry Y. Chen Xiaohui Liu

Wrapper feature selection methods are widely used to select relevant features. However, wrappers only use a single classifier. The downside to this approach is that each classifier will have its own biases and will therefore select very different features. In order to overcome the biases of individual classifiers, this study introduces a new data mining method called wrapper-based decision tree...

متن کامل

Survey on Feature Selection

Journal: :CoRR 2015

Tarek Amr Abdallah Beatriz de la Iglesia

Feature selection plays an important role in the data mining process. It is needed to deal with the excessive number of features, which can become a computational burden on the learning algorithms. It is also necessary, even when computational resources are not scarce, since it improves the accuracy of the machine learning tasks, as we will see in the upcoming sections. In this review, we discu...

متن کامل

Protein Sequence Classification Using Feature Selection

2014

K. Radha

The sheer volume of data today and its expected growth over the next years are some of the key challenges in data mining and knowledge discovery applications. Besides the huge number of data samples that are collected and processed, the high dimensional nature of data arising in many applications causes the need to develop effective and efficient techniques that are able to deal with this massi...

متن کامل