Genomics via Optical h,lapping III: Contiging Genomic DNA
نویسندگان
چکیده
In this paper, we describe our algorithmic approacl, to constructing an aligmnent of (contiging) a set of restriction maps created from the images of individual genomic (unchmed) DNA molecules digested by resl.riction enzymes. Generally, these DNA segments are sized in the range of 1 4Mb. The goal is to devise contiging algorithms capable of producing high-quality composite maps rapidly and in a scalable maturer. The resulting softwart, is a key component of our physical mapping automation tools and ha.s been used t.o create complete maps of various microorganisms (E. coll. P. falciparum and D. radiodurans). Experimental l’eSll]tS match known seqHent’e data. The optical rnapl)ing approach (Cai et al. 1998; Ammtharantan. Mishra a.nd Schwartz 199/’; Samad et al. 1995: Schwartz et al. 1993) can be used t.o deternfine an approximate restriction map (with ordering of fragments) from fluorescent microscol)y images of individual DNA molecules. When the DNA molecules are derived from clones: an accurate restriction map can be obtained by combining the approximate restriction maps from a small number (50-200) of DNA molecules (Anantharaman and Mishra 1998a). We have previously described a Bayesian/Maxinmm-likelihood algorithm capable of automatically producing accurate maps for inoclerate size clones (e.g., BAC, Bacterial Artiticial Chromosome) (Anantharaman, Mishra and Schwartz 1997; Cai et al. 1998; Anantharaman and Mishra 1998b). In this paper, we extend our approach to explore if accurate restriction maps van be constructed using only genomic (uncloned) DNA. Each DNA molecule will a random fl’agment of the, genome. The approximate restriction maps of these DNA molecules generated by optical mapping need to be combined into a larger restriction map by cmnput.ing the contig (aligninent between the, m) from the overlaps despite the errors in the maps. At the same time errors in the single molecule restriction maps need to be eliminated by combining the information from multiple overlapped restriction maps. Copyright @1999, American Association for Artificial Intelligence (www.aaai.org). All rights reserw?d. In particular, we can use optical maI~ping to generate single molecule DNA restriction maps tbr ran(tom DN-\ sc~.lm,"nts sW. between 1Mb and 4Mb. The individual restriction nla I) produced froIil the images may have false negalives (up to 30%. restriction cut sites missing as a result of partial digestion), fa.lse positives (up to 20t~: false optical cut sites produced by the image processing algorithm or DNA breata~ge), sizing errors (variations in the estimated distance between the actual restriction sites, ranging from 5-30% with an average value of 10-15%), the inability to tell the orientation of the DNA segment, and the loss of some fraction of the small restriction fi’agments etc. With the current t.echnology developed in our laboratory, it is possible t.o create such "imperfect maps" for a large number of I)NA segments with high throughput (,ling et al. 1999). For instance, we were able to map about 100 segments of length 70()Kb to 1.4Aib from Dcinococc~s radio&,’ra.n.s a.nd the resulting maps had a digestion rate exceeding 70%, a relative sizing error ~ 15% and under 5% of all cuts observed were false positiw~. A key to solving this shotgun optical maptfing problem is a set. of efficient algorithnts for contiging individual maps with significant errors. The algorithms ha.ve been implemented in a program called Gent ig and tested on real and simulated data. Tt,e paper is organized a.s follows: In the next section (Section 2), we present the algorithms used in Gentig, based on a Bayesian/Maximmn-likelihood formulation, to contig restriction maps of genomk: DNA segments subject to the constraint that the false positive overlap protoabilil.y does not exceed SOIl)A? prespecified value. We also discuss a set of heuristic algorithms in order to de,rive an efficient, ilnlflementation. In section 3 we present experimental results using Gentig. ’rite final see:lion discusses the worst-case complexity of the prolfiem (it. is NP-hard), a statistical analysis of tit<, data m,der various error models (Mishra 1999), applications of our a.lgorithlns, and related open I~roblems. An overview of our restriction map generation process is ilhlstrated in Figure (1). 18 ANANTHARAMAN From: ISMB-99 Proceedings. Copyright © 1999, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
Genomics via Optical Mapping III : Contiging Genomic DNA and Variations
In this paper, we describe our algorithmic approach to constructing an alignment of (con-tiging) a set of optical maps created from the images of individual genomic DNA molecules digested by restriction enzymes. Generally, these DNA segments are sized in the range of 1{4Mb. The problem of assembling clone contig maps is a simpler special case of this contig problem and is handled by our algorit...
متن کاملExperimental Investigation of Subsurface Damages Made by Cup Grinding and Lapping Process of Optical Glass BK7 in Ductile Mode
Conventional material removal of BK7 optical glass will normally result in brittle fracture at the surface, generating severe subsurface damage and poor surface finish. Subsurface damages induced by grinding strongly influence the mechanical strength and optical quality of optical glasses. However, through ductile mode grinding it is possible to reduce the surface and subsurface cracks. It is m...
متن کاملExperimental Investigation of Subsurface Damages Made by Cup Grinding and Lapping Process of Optical Glass BK7 in Ductile Mode
Conventional material removal of BK7 optical glass will normally result in brittle fracture at the surface, generating severe subsurface damage and poor surface finish. Subsurface damages induced by grinding strongly influence the mechanical strength and optical quality of optical glasses. However, through ductile mode grinding it is possible to reduce the surface and subsurface cracks. It is m...
متن کاملSingle-molecule approach to bacterial genomic comparisons via optical mapping.
Modern comparative genomics has been established, in part, by the sequencing and annotation of a broad range of microbial species. To gain further insights, new sequencing efforts are now dealing with the variety of strains or isolates that gives a species definition and range; however, this number vastly outstrips our ability to sequence them. Given the availability of a large number of microb...
متن کاملIdentification and Discrimination of Salmonella Enteritidis, S. Pullorum, S. Gallinarum and S. Dublin Using Salmonella Specific Genomic Regions Amplification Assay
Background: DNA amplification method has been developed for identifying and discriminating Salmonella serovars, using specific primers at the genus and serovar levels and to identify the S. Enteritidis, S. Dublin, S. Gallinarum and S. Pullorum. Objectives: This study was conducted for molecular identification and discrimination among some important Salmonella serovars. Methods: Fifty isolates o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002