Inference from Low Precision Transcriptome Data Representation
نویسندگان
چکیده
Microarray measurements are being widely used to infer gene functions, identify regulatory mechanisms and to predict phenotypes. These measurements are usually made and recorded to high numerical precision (e.g. 0.24601). However, aspects of the underlying biology, including mRNA molecules being highly unstable, being only available in very small copy numbers and the measurements usually being made over a heterogeneous population of cells, ought to make us sceptical about the reproducibility of these measurements and thus the numerical precisions reported. In this paper, we show that over a range of different procedures (classification, cluster analysis, detection of periodically expressed genes and the analysis of developmental time course data), the quality of inference from microarray data does not significantly degrade when the numerical precision is lowered by quantization. A surprising finding, with respect to classification problems, is that much of the discrimination is retained with numerical precision as low as binary (i.e. whether the gene is expressed or not). From this premise we show preliminary results that similarity metrics suitable for binary spaces, namely the Tanimoto metric used in chemoinformatics, can be successfully deployed to improve classification accuaracies of binarized transcriptome data. Electronic supplementary material The online version of this article (doi: 10.1007/s11265-009-0363-2) contains supplementary material, which is available to authorized users. S. Tuna · M. Niranjan (B) University of Southampton, Southampton, UK e-mail: [email protected]
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملEvaluation of the Efficiency of the Adaptive Neuro Fuzzy Inference System (ANFIS) in the Modeling of the Ionosphere Total Electron Content Time Series Case Study: Tehran Permanent GPS Station
Global positioning system (GPS) measurements provide accurate and continuous 3-dimensional position, velocity and time data anywhere on or above the surface of the earth, anytime, and in all weather conditions. However, the predominant ranging error source for GPS signals is an ionospheric error. The ionosphere is the region of the atmosphere from about 60 km to more than 1500 km above the eart...
متن کاملBreast Cancer Risk Assessment Using adaptive neuro-fuzzy inference system (ANFIS) and Subtractive Clustering Algorithm
Introduction: The adaptive neuro-fuzzy inference system (ANFIS) is a soft computing model based on neural network precision and fuzzy decision-making advantages, which can highly facilitate diagnostic modeling. In this study we used this model in breast cancer detection. Methodology: A set of 1,508 records on cancerous and non-cancerous participant’s risk factors was used. First,...
متن کاملBreast Cancer Risk Assessment Using adaptive neuro-fuzzy inference system (ANFIS) and Subtractive Clustering Algorithm
Introduction: The adaptive neuro-fuzzy inference system (ANFIS) is a soft computing model based on neural network precision and fuzzy decision-making advantages, which can highly facilitate diagnostic modeling. In this study we used this model in breast cancer detection. Methodology: A set of 1,508 records on cancerous and non-cancerous participant’s risk factors was used. First,...
متن کاملElucidation of the sequential transcriptional activity in Escherichia coli using time-series RNA-seq data
Functional genomics and gene regulation inference has readily expanded our knowledge and understanding of gene interactions with regards to expression regulation. With the advancement of transcriptome sequencing in time-series comes the ability to study the sequential changes of the transcriptome. Here, we present a new method to augment regulation networks accumulated in literature with transc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Signal Processing Systems
دوره 58 شماره
صفحات -
تاریخ انتشار 2010