A large-scale benchmark of gene prioritization methods
نویسندگان
چکیده
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
منابع مشابه
Employing Nonlinear Response History Analysis of ASCE 7-16 on a Benchmark Tall Building
ASCE 7-16 has provided a comprehensive platform for the performance-based design of tall buildings. The core of the procedure is based on nonlinear response history analysis of the structure subjected to recorded or simulated ground motions. This study investigates consistency in the ASCE 7-16 requirements regarding the use of different types of ground motions. For this purpose performance of a...
متن کاملExperimental validation of predicted cancer genes using FRET.
Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene pr...
متن کاملIntegrating Computational Biology and Forward Genetics in Drosophila
Genetic screens are powerful methods for the discovery of gene-phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of "omics" data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in viv...
متن کاملCandidate gene prioritization with Endeavour
Genomic studies and high-throughput experiments often produce large lists of candidate genes among which only a small fraction are truly relevant to the disease, phenotype or biological process of interest. Gene prioritization tackles this problem by ranking candidate genes by profiling candidates across multiple genomic data sources and integrating this heterogeneous information into a global ...
متن کاملExpression of Recombinant Factor IX Using the Transient Gene Expression Technique
Background: Pilot and large-scale production of recombinant proteins requires the presence of stable clones capable of producing large quantities of recombinant proteins. Not only the process of selecting stable clones is time consuming, but also the continuous culturing of clones in large-scale production may cause loss of incoming plasmid and recombinant genes. Thus, considering the advanceme...
متن کامل