Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data
نویسندگان
چکیده
DNA microarray experiment inevitably generates gene expression data with missing values. An important and necessary pre-processing step is thus to impute these missing values. Existing imputation methods exploit gene correlation among all experimental conditions for estimating the missing values. However, related genes coexpress in subsets of experimental conditions only. In this paper, we propose to use biclusters, which contain similar genes under subset of conditions for characterizing the gene similarity and then estimating the missing values. To further improve the accuracy in missing value estimation, an iterative framework is developed with a stopping criterion on minimizing uncertainty. Extensive experiments have been conducted on artificial datasets, real microarray datasets as well as one non-microarray dataset. Our proposed biclusters-based approach is able to reduce errors in missing value estimation. & 2011 Elsevier Ltd. All rights reserved.
منابع مشابه
Use of biclustering for missing value imputation in gene expression data
DNA microarray data always contains missing values. As subsequent analysis such as biclustering can only be applied on complete data, these missing values have to be imputed before any biclusters can be detected. Existing imputation methods exploit coherence among expression values in the microarray data. In view that biclustering attempts to find correlated expression values within the data, w...
متن کاملMissing value estimation for DNA microarray gene expression data: local least squares imputation
MOTIVATION Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local simi...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملPerformance Evaluation of L1-norm-based Microarray Missing Value Imputation
l1-norm minimization was utilized in the imputation of microarray missing values, which is an important procedure in bioinformatics experiments. Two l1 approaches, based on the framework of local least squares (LLS) and iterative biclusterbased least squares (bicluster-iLLS) respectively, were employed. Imputed datasets of the l1 approaches were compared with those of traditional l2 methods. Th...
متن کاملCF-GeNe: Fuzzy Framework for Robust Gene Regulatory Network Inference
Most Gene Regulatory Network (GRN) studies ignore the impact of the noisy nature of gene expression data despite its significant influence upon inferred results. This paper presents an innovative Collateral-Fuzzy Gene Regulatory Network Reconstruction (CF-GeNe) framework for Gene Regulatory Network (GRN) inference. The approach uses the Collateral Missing Value Estimation (CMVE) algorithm as it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 45 شماره
صفحات -
تاریخ انتشار 2012