Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data
نویسندگان
چکیده
BACKGROUND Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be "unclean". Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction. RESULTS After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3'-end terminal structures in designated 5'-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are "unclean" or abnormal, all of which could be cleaned or filtered by AFST. CONCLUSIONS cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
منابع مشابه
Generation of cohesive ends on PCR products by UDG-mediated excision of dU, and application for cloning into restriction digest-linearized vectors.
We have investigated the use of dU excision by uracil N-glycosylase (UDG) to create cohesive ends on PCR fragments "mimicking" those generated by restriction enzymes. The feasibility of this approach for directional and nondirectional cloning using cohesive ends mimicking SacI or PstI ends is demonstrated by the subcloning of a 383 to 388-bp fragment of bovine basic fibroblast growth factor int...
متن کاملGeneration of Cohesive Ends on PCR Products by UDG-mediated Excision of dU, and Application I'.or Cloning into Restriction D,gest-ltnear, zed Vectors
We have Investigated the use of dU excision by uracil N-glycosylase (UDG) to create cohesive ends on PCR fragments "mimicking" those generated by restriction enzymes. The feasibility of this approach for directional and nondirectional cloning using cohesive ends mimicking Sacl or Pstl ends is demonstrated by the subcloning of a 383 to 388-bp fragment of bovine basic flbroblast growth factor int...
متن کاملMolecular Cloning and Mutagenesis of Rat Glucocerebrosidase Gene
Purpose: The aim of this study was cloning the Gba enzyme in pUCBM21 plasmid, and making frame mutation on it and sequencing it. Materials and methods: mRNA was extracted from mouse spleen and glucocerebrosidase cDNA was synthesized and amplified by PCR with specific primers. cDNA was cloned in pUCBM21 and analyzed by restriction enzymes. A fragment of its sequence was deleted using MscI restr...
متن کاملGenEST, a powerful bidirectional link between cDNA sequence data and gene expression profiles generated by cDNA-AFLP.
The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-...
متن کاملMultiplex gene removal by two-step polymerase chain reactions.
Precise DNA manipulation is critical for molecular biotechnology. Restriction enzyme-based approaches are limited by their requirement of specific enzyme sites. Restriction-free cloning has greatly improved the flexibility and speed of precise DNA assembly. Most of these approaches focus on DNA assembly rather than gene removal. Here we present a polymerase chain reaction (PCR)-based cloning me...
متن کامل