Seminal Quality Prediction Using Clustering-Based Decision Forests

نویسندگان

  • Hong Wang
  • Qingsong Xu
  • Lifeng Zhou
چکیده

Prediction of seminal quality with statistical learning tools is an emerging methodology in decision support systems in biomedical engineering and is very useful in early diagnosis of seminal patients and selection of semen donors candidates. However, as is common in medical diagnosis, seminal quality prediction faces the class imbalance problem. In this paper, we propose a novel supervised ensemble learning approach, namely Clustering-Based Decision Forests, to tackle unbalanced class learning problem in seminal quality prediction. Experiment results on real fertility diagnosis dataset have shown that Clustering-Based Decision Forests outperforms decision tree, Support Vector Machines, random forests, multilayer perceptron neural networks and logistic regression by a noticeable margin. Clustering-Based Decision Forests can also be used to evaluate variables’ importance and the top five important factors that may affect semen concentration obtained in this study are age, serious trauma, sitting time, the season when the semen sample is produced, and high fevers in the last year. The findings could be helpful in explaining seminal concentration problems in infertile males or pre-screening semen donor candidates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Adaboost Algorithm by using Random Forests as Weak Learner and using this algorithm as statistics machine learning for traffic flow prediction Research proposal for a Ph.D. Thesis

The main goal of this doctoral research is to plan and to build statistic machine learning system for “traffic flow manage control” of urban area for a prediction of traffic flow problem. I also present a new algorithm for improving the accuracy of boosting algorithms for learning binary concepts, which will be used as statistics machine learning. This new algorithm is based on ideas presented ...

متن کامل

An Efficient Predictive Model for Probability of Genetic Diseases Transmission Using a Combined Model

In this article, a new combined approach of a decision tree and clustering is presented to predict the transmission of genetic diseases. In this article, the performance of these algorithms is compared for more accurate prediction of disease transmission under the same condition and based on a series of measures like the positive predictive value, negative predictive value, accuracy, sensitivit...

متن کامل

Using Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach

Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...

متن کامل

Generating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms

UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...

متن کامل

Image Categorization Using Scene-Context Scale Based on Random Forests

Scene-context plays an important role in scene analysis and object recognition. Among various sources of scene-context, we focus on scene-context scale, which means the effective scale of local context to classify an image pixel in a scene. This paper presents random forests based image categorization using the scene-context scale. The proposed method uses random forests, which are ensembles of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Algorithms

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2014