Machine Learning Based Missing Value Imputation Method for Clinical Dataset
نویسندگان
چکیده
Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore the use of a machine learning technique as a missing value imputation method for incomplete cardiovascular data. Mean/mode imputation, fuzzy unordered rule induction algorithm imputation, decision tree imputation and other machine learning algorithms are used as missing value imputation and the final datasets are classified using decision tree, fuzzy unordered rule induction, KNN and K-Mean clustering. The experiment shows that final classifier performance is improved when the fuzzy unordered rule induction algorithm is used to predict missing attribute values for K-Mean clustering and most of the cases machine learning technique found to be performed better than the mean imputation.
منابع مشابه
An Ensemble approach on Missing Value Handling in Hepatitis Disease Dataset
The Major work in data pre-processing is handling Missing value imputation in Hepatitis Disease Diagnosis which is one of the primary stage in data mining. Many health datasets are typically imperfect. Just removing the cases from the original datasets can fetch added problems than elucidations. A appropriate technique for missing value imputation can assist to generate high-quality datasets fo...
متن کاملPerformance Evaluation of Mutation / Non-Mutation Based Classification With Missing Data
A common problem encountered by many data mining techniques is the missing data. A missing data is defined as an attribute or feature in a dataset which has no associated data value. Correct treatment of these data is crucial, as they have a negative impact on the interpretation and result of data mining processes. Missing value handling techniques can be grouped into four categories, namely, c...
متن کاملAn Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. ...
متن کاملIterative Non - Parametric Method for Manipulating Missing Values of Heterogeneous Datasets by Clustering Fatigue and Corrosion Fatigue Behavior of Nickel Alloys in Saline Solutions
-Machine learning and data mining retort heavily on a large amount of data to build learning models and make predictions. There is a need for quality of data, thus the quality of data is ultimately important. Many of the industrial and research databases are plagued by the problem of missing values. A variety of methods have been developed with great success on dealing with missing values in da...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کامل