Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

نویسندگان

  • Jianping Hua
  • Michael L. Bittner
  • Edward R. Dougherty
چکیده

Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a Hybrid Approach based on Two-stage Data Envelopment Analysis to Evaluating Organization Productivity

   Measuring the performance of a production system has been an important task in management for purposes of control, planning, etc. Lord Kelvin said :“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” Hence, manag...

متن کامل

A Model for Project Selecting with Limited Resources in Data Envelopment Analysis with Input and Output Fuzzy

In Evaluating Performance, Selecting a Subset from a Set of Solutions with Limited Resources is Essential. If There Is More Than One Input and Output, the Data Rnvelopment Analysis Optimization Models Are Evaluated and Performance Measurement Based on the Weighted Output Is Divided Weighted Input. In This Research, Two Models of Optimization with Limited Resources Present from Data Envelopment ...

متن کامل

Evaluating the Effect of Various Parameters of Protective Spur Dike on Scour Depth Reduction using Group Method of Data Handling (GMDH) and Gene Expression Programming (GEP)

            Spur dikes are one of the common methods to protect rivers against erosion. Scouring around the spur dike is an important factor that can disorder the structural performance. Using protective spur dike is proper technique reduce the scour amount. In this research, the GMDH and GEP model used in order to evaluate and estimate the effect of various parameters of protective spur dike o...

متن کامل

Interpreting Gene Expression Data by Searching for Enriched Gene Sets

This paper presents a novel method integrating gene-gene interaction information and Gene Ontology for the construction of new gene sets that are potentially enriched. Enrichment of a gene set is determined by Gene Set Enrichment Analysis, which is a microarray data analysis method that uses ranks of the genes, according to their differentially expression values, to identify significant biologi...

متن کامل

Evaluation and ranking of suppliers with fuzzy DEA and PROMETHEE approach

Supplier selection is a multi-Criteria problem. This study proposes a hybrid model for supporting the suppliers’ selection and ranking. This research is a two-stage model designed to fully rank the suppliers where each supplier has multiple Inputs and Outputs. First, the supplier evaluation problem is formulated by Data Envelopment Analysis (DEA), since the regarded decision deals with uncertai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2014