BioQ: tracing experimental origins in public genomic databases using a novel data provenance model
نویسندگان
چکیده
UNLABELLED Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data. RESULTS We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects. AVAILABILITY AND IMPLEMENTATION BioQ is freely available to the public at http://bioq.saclab.net.
منابع مشابه
Data Provenance: Some Basic Issues
The ease with which one can copy and transform data on the Web, has made it increasingly difficult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance is now an acute issue in scientific databases where it is central to the validation of data. In this pape...
متن کاملSourceTrac: Tracing Data Sources within Spreadsheets
Analyzing data from multiple sources is a common task in scientific research. In particular, spreadsheet data is often aggregated from a variety of sources to identify patterns and synthesize reports. Yet, techniques are lacking for automatically capturing the provenance of such data within spreadsheet environments like Excel. We present a novel approach for fine-grained tracing of tabular data...
متن کاملResearch Problems in Data Provenance
The problem of tracing the provenance (also known as lineage) of data is an ubiquitous problem that is frequently encountered in databases that are the result of many transformation steps. Scientific databases and data warehouses are some examples of such databases. However, contributions from the database research community towards this problem have been somewhat limited. In this paper, we mot...
متن کاملFlexible Provenance Tracing
The description of the origins of a piece of data and the transformations by which it arrived in a database is termed the data provenance. The importance of data provenance has already been widely recognized in database community. The two major approaches to representing provenance information use annotations and inversion. While annotation is metadata pre-computed to include the derivation his...
متن کاملRecording Provenance on Probabilistic Databases
Tracking data provenance (or lineage) has become increasingly important in many large-scale applications. Till now, a few methods have been proposed to record data provenance. However, most of them mainly focus on deterministic databases except Trio style lineage that aims at probabilistic databases. Processing provenance upon probabilistic database is even challenging because of the exponentia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 28 8 شماره
صفحات -
تاریخ انتشار 2012