Cross‐validated permutation feature importance considering correlation between features

نویسندگان

چکیده

Abstract In molecular design, material process and control, it is important not only to construct a model with high predictive ability between explanatory features x objective y using dataset but also interpret the constructed model. An index of feature importance in permutation (PFI), which can be combined any regressors classifiers. However, PFI becomes unstable when number samples low because necessary divide into training validation data calculating it. Additionally, there are strongly correlated x, these estimated low. Hence, cross‐validated (CVPFI) method proposed. CVPFI calculated stably, even small samples, construction evaluation repeated based on cross‐validation. Furthermore, by considering absolute correlation coefficients features, evaluated appropriately x. Case studies numerical simulation actual compound showed that compared PFI. This possible low, linear nonlinear relationships mixed strong correlations quantised biased exist Python codes for available at https://github.com/hkaneko1985/dcekit .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Permutation importance: a corrected feature importance measure

MOTIVATION In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support ...

متن کامل

Permutation Tests for Correlation between Two Distance Matrices

Biologists frequently summarize multivariate data from n populations by computing some measure of distance between populations. The problem then arises of comparing two such pairwise distance matrices based on different characters. A permutation test for correlation between distance matrices is proposed. This test, based on Kendall's tau statistic, is compared to Pearson's product-moment and Sp...

متن کامل

Feature selection algorithm based on correlation between muti metric network traffic flow features

Traffic identification is a hot issue in recent years, in order to overcome shortcomings of port-based and Deep Packet Inspection (DPI), machine learning algorithm has gained wide attention, but nowadays research focus on traffic identification based on full packets dataset, which would be a great challenge to identify online traffic flow. It is a way to overcome this shortcoming by considering...

متن کامل

Analysis of correlation between audio and visual speech features for clean audio feature prediction in noise

The aim of this work is to examine the correlation between audio and visual speech features. The motivation is to find visual features that can provide clean audio feature estimates which can be used for speech enhancement when the original audio signal is corrupted by noise. Two audio features (MFCCs and formants) and three visual features (active appearance model, 2-D DCT and cross-DCT) are c...

متن کامل

Dependability Models for Iterative Software Considering Correlation between Successive Inputs

We consider the dependability of programs of an iterative nature. The dependability of software structures is usually analysed using models that are strongly limited in their realism by the assumptions made to obtain mathematically tractable models and by the lack of experimental data. The assumption of independence between the outcomes of successive executions, which is often false, may lead t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Analytical science advances

سال: 2022

ISSN: ['2628-5452']

DOI: https://doi.org/10.1002/ansa.202200018