Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats
نویسندگان
چکیده
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
منابع مشابه
Tandem UAA repeats at the 3'-end of the transcript are essential for the precise initiation of reverse transcription of the I factor in Drosophila melanogaster.
Non-long terminal repeat retrotransposons, widespread among eukaryotic genomes, transpose by reverse transcription of an RNA intermediate. Some of them, like L1 in the human, terminate at the 3'-end with a poly(dA) stretch whereas others, like the I factor in Drosophila melanogaster, have instead a short sequence repeated in tandem. This suggests different requirements for the initiation of rev...
متن کاملRetrotransposition of the I factor, a non-long terminal repeat retrotransposon of Drosophila, generates tandem repeats at the 3' end
Non-long terminal repeat (LTR) retrotransposons or LINEs transpose by reverse transcription of an RNA intermediate and are thought to use the 3' hydroxyl of a chromosomal cleavage to initiate synthesis of the first strand of the cDNA. Many of them terminate in a poly(dA) sequence at the 3' end of the coding strand although some, like the I factor of Drosophila melanogaster, have 3' ends formed ...
متن کاملEditorial: Could Speciation Across Evolution be Governed by Genetic Switch Codes at Short Tandem Repeats?
متن کامل
Condensin Promotes Position Effects within Tandem DNA Repeats via the RITS Complex.
Tandem repetitive DNA is highly abundant in eukaryotic genomes and contributes to transcription control and genome stability. However, how the individual sequences within tandem repeats behave remains largely unknown. Here we develop a collection of fission yeast strains with a reporter gene inserted at different units in a tandem repeat array. We show that, contrary to what is usually assumed,...
متن کاملReanalysis and revision of the complete mitochondrial genome of Rachycentron canadum (Teleostei, Perciformes, Rachycentridae).
The complete mitochondrial genome of cobia, Rachycentron canadum, was reanalyzed and revised. The genome is 18,008 bp in length, containing 13 protein-coding genes, 2 ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes, and a control region or displacement loop (D-loop). The gene arrangement is identical to that observed in most vertebrates. Base composition on the heavy strand is 30.14% A...
متن کامل