Re-analysis of publicly available datasets

نویسندگان

Lucia Peixoto

Davide Risso

Shane G. Poplawski

Mathieu

E. Wimmer

Terence P. Speed

Marcelo A. Wood

Ted Abel

چکیده

Re-analysis of publicly available datasets Lucia Peixoto, Davide Risso, Shane G. Poplawski, Mathieu, E. Wimmer, Terence P. Speed, Marcelo A. Wood and Ted Abel We retrieved the pre-processed data of several publicly available studies from GEO (see main text for details). In this Section, we plot the PCA of each dataset using the original normalization. Starting from the data as normalized by the authors, or applying UQ scaling normalization if the authors provided only raw counts, we apply RUVs using all the genes as negative controls and choosing the value of k that led to the best looking RLE plot. For each dataset, we retained only the genes expressed in at least three replicate samples. This analysis is intended to show that published normalized datasets often show residual unwanted variation and that RUVs can remove unwanted variation when present and does not compromise the data when scaling normalization is working well. A more careful analysis of each dataset, e.g. by selecting a problem-specific set of negative control genes, could lead to better results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Person re-identification (re-id) is a critical problem in video analytics applications such as security and surveillance. The public release of several datasets and code for vision algorithms has facilitated rapid progress in this area over the last few years. However, directly comparing re-id algorithms reported in the literature has become difficult since a wide variety of features, experimen...

متن کامل

A Comprehensive Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

متن کامل

Towards a Dataset for Natural Language Requirements Processing

[Context and motivation] The current breakthrough of natural language processing (NLP) techniques can provide the requirements engineering (RE) community with powerful tools that can help addressing specific tasks of natural language (NL) requirements analysis, such as traceability, ambiguity detection and requirements classification, to name a few. [Question/problem] However, modern NLP techni...

متن کامل

Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers

BACKGROUND The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too sparse and diverse to allow complex and accurate queries. In this study we examined the ability o...

متن کامل

DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data

With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In thi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Re-analysis of publicly available datasets

نویسندگان

چکیده

منابع مشابه

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

A Comprehensive Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Towards a Dataset for Natural Language Requirements Processing

Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers

DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data

عنوان ژورنال:

اشتراک گذاری