Improving the ostrich genome assembly using optical mapping data

نویسندگان

  • Jilin Zhang
  • Cai Li
  • Qi Zhou
  • Guojie Zhang
چکیده

BACKGROUND The ostrich (Struthio camelus) is the tallest and heaviest living bird. Ostrich meat is considered a healthy red meat, with an annual worldwide production ranging from 12,000 to 15,000 tons. As part of the avian phylogenomics project, we sequenced the ostrich genome for phylogenetic and comparative genomics analyses. The initial Illumina-based assembly of this genome had a scaffold N50 of 3.59 Mb and a total size of 1.23 Gb. Since longer scaffolds are critical for many genomic analyses, particularly for chromosome-level comparative analysis, we generated optical mapping (OM) data to obtain an improved assembly. The OM technique is a non-PCR-based method to generate genome-wide restriction enzyme maps, which improves the quality of de novo genome assembly. FINDINGS In order to generate OM data, we digested the ostrich genome with KpnI, which yielded 1.99 million DNA molecules (>250 kb) and covered the genome at least 500×. The pattern of molecules was subsequently assembled to align with the Illumina-based assembly to achieve sequence extension. This resulted in an OM assembly with a scaffold N50 of 17.71 Mb, which is 5 times as large as that of the initial assembly. The number of scaffolds covering 90% of the genome was reduced from 414 to 75, which means an average of ~3 super-scaffolds for each chromosome. Upon integrating the OM data with previously published FISH (fluorescence in situ hybridization) markers, we recovered the full PAR (pseudoatosomal region) on the ostrich Z chromosome with 4 super-scaffolds, as well as most of the degenerated regions. CONCLUSIONS The OM data significantly improved the assembled scaffolds of the ostrich genome and facilitated chromosome evolution studies in birds. Similar strategies can be applied to other genome sequencing projects to obtain better assemblies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes; however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated Pacific Biosciences (PacBio) long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To im...

متن کامل

De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping.

BACKGROUND It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to be...

متن کامل

Misassembly detection using paired-end sequence reads and optical mapping data

MOTIVATION A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source computational methods fo...

متن کامل

Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome

UNLABELLED Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contigu...

متن کامل

The hidden perils of read mapping as a quality assessment tool in genome sequencing

This article provides a comparative analysis of the various methods of genome sequencing focusing on verification of the assembly quality. The results of a comparative assessment of various de novo assembly tools, as well as sequencing technologies, are presented using a recently completed sequence of the genome of Lactobacillus fermentum 3872. In particular, quality of assemblies is assessed b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2015