Allowing mismatches in anchors for wholw genome alignment: Generation and effectiveness

نویسندگان

  • Siu-Ming Yiu
  • P. Y. Chan
  • Tak Wah Lam
  • Wing-Kin Sung
  • Hing-Fung Ting
  • Prudence W. H. Wong
چکیده

Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of the anchors. Some popular software tools use the exact match maximal unique substrings (EM-MUM) as anchors. However, the result is not satisfactory especially for genomes with high mutation rates (e.g. virus). In our experiments, we found that more than 40% of the conserved genes are not recovered. In this paper, we consider anchors with mismatches. Our contributions include the following.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allowing Mismatches in Anchors for Whole Genome Alignment

Recent work on whole genome alignment has resulted in efficient tools to locate (possibly) conserved regions of two genomic sequences. Most of such tools start with locating a set of short and highly similar substrings (called anchors) that are present in both genomes. These anchors provide clues for the conserved regions, and the effectiveness of the tools is highly related to the quality of t...

متن کامل

BatMis: a fast algorithm for k-mismatch mapping

MOTIVATION Second-generation sequencing (SGS) generates millions of reads that need to be aligned to a reference genome allowing errors. Although current aligners can efficiently map reads allowing a small number of mismatches, they are not well suited for handling a large number of mismatches. The efficiency of aligners can be improved using various heuristics, but the sensitivity and accuracy...

متن کامل

ProbeMatch: rapid alignment of oligonucleotides to genome allowing both gaps and mismatches

SUMMARY We have developed a tool, called ProbeMatch, for matching a large set of oligonucleotide sequences against a genome database using gapped alignments. Unlike most of the existing tools such as ELAND which only perform ungapped alignments allowing at most two mismatches, ProbeMatch generates both ungapped and gapped alignments allowing up to three errors including insertion, deletion and ...

متن کامل

Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA

Applied Biosystems’ SOLiD system offers a low-cost alternative to the traditional Sanger method of DNA sequencing. We introduce two main algorithms of mapping SOLiD’s color reads onto a reference genome. The first method performs mapping by adapting a greedy alignment framework. In such an alignment, reads are mapped to approximate genome positions, allowing for a pre-specified bound on sequenc...

متن کامل

GapMis-OMP: Pairwise Short-Read Alignment on Multi-core Architectures

Pairwise sequence alignment has received a new motivation due to the advent of next-generation sequencing technologies, particularly so for the application of re-sequencing—the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005