Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes.
نویسندگان
چکیده
We have performed the first genome-wide analysis of the Inverted Repeat (IR) structure in the human genome, using a novel and efficient software package called Inverted Repeats Finder (IRF). After masking of known repetitive elements, IRF detected 22,624 human IRs characterized by arm size from 25 bp to >100 kb with at least 75% identity, and spacer length up to 100 kb. This analysis required 6 h on a desktop PC. In all, 166 IRs had arm lengths >8 kb. From this set, IRs were excluded if they were in unfinished/unassembled regions of the genome, or clustered with other closely related IRs, yielding a set of 96 large IRs. Of these, 24 (25%) occurred on the X-chromosome, although it represents only approximately 5% of the genome. Of the X-chromosome IRs, 83.3% were >/=99% identical, compared with 28.8% of autosomal IRs. Eleven IRs from Chromosome X, one from Chromosome 11, and seven already described from Chromosome Y contain genes predominantly expressed in testis. PCR analysis of eight of these IRs correctly amplified the corresponding region in the human genome, and six were also confirmed in gorilla or chimpanzee genomes. Similarity dot-plots revealed that 22 IRs contained further secondary homologous structures partially categorized into three distinct patterns. The prevalence of large highly homologous IRs containing testes genes on the X- and Y-chromosomes suggests a possible role in male germ-line gene expression and/or maintaining sequence integrity by gene conversion.
منابع مشابه
Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملEvolutionary comparisons of the S segments in the genomes of herpes simplex virus type 1 and varicella-zoster virus.
The genomes of herpes simplex virus type 1 (HSV-1) and varicella-zoster virus (VZV) consist of two covalently joined segments, L and S. Each segment comprises an unique sequence flanked by inverted repeats. We have reported previously the DNA sequences of the S segments in these two genomes, and have identified protein-coding regions therein. In HSV-1, the unique sequence of S contains ten enti...
متن کاملThe genome of lipid-containing bacteriophage PRD1, which infects gram-negative bacteria, contains long, inverted terminal repeats.
The bacteriophage PRD1 is a lipid-bearing phage that infects a wide variety of gram-negative bacteria, including Escherichia coli and Salmonella typhimurium when they contain the appropriate plasmid. It contains a linear duplex DNA molecule that is covalently bound by its 5' ends to a terminal protein. We report here that the PRD1 genome contains a 111-base-pair terminal inverted repeat which d...
متن کاملThe Complete Chloroplast Genome Sequence of Cephalotaxus oliveri (Cephalotaxaceae): Evolutionary Comparison of Cephalotaxus Chloroplast DNAs and Insights into the Loss of Inverted Repeat Copies in Gymnosperms
We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are co...
متن کاملEvidence for Active Maintenance of Inverted Repeat Structures Identified by a Comparative Genomic Approach
Inverted repeats have been found to occur in both prokaryotic and eukaryotic genomes. Usually they are short and some have important functions in various biological processes. However, long inverted repeats are rare and can cause genome instability. Analyses of C. elegans genome identified long, nearly-perfect inverted repeat sequences involving both divergently and convergently oriented homolo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome research
دوره 14 10A شماره
صفحات -
تاریخ انتشار 2004