The PARA-suite: PAR-CLIP specific sequence read simulation and processing
نویسندگان
چکیده
BACKGROUND Next-generation sequencing technologies have profoundly impacted biology over recent years. Experimental protocols, such as photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), which identifies protein-RNA interactions on a genome-wide scale, commonly employ deep sequencing. With PAR-CLIP, the incorporation of photoactivatable nucleosides into nascent transcripts leads to high rates of specific nucleotide conversions during reverse transcription. So far, the specific properties of PAR-CLIP-derived sequencing reads have not been assessed in depth. METHODS We here compared PAR-CLIP sequencing reads to regular transcriptome sequencing reads (RNA-Seq) to identify distinctive properties that are relevant for reference-based read alignment of PAR-CLIP datasets. We developed a set of freely available tools for PAR-CLIP data analysis, called the PAR-CLIP analyzer suite (PARA-suite). The PARA-suite includes error model inference, PAR-CLIP read simulation based on PAR-CLIP specific properties, a full read alignment pipeline with a modified Burrows-Wheeler Aligner algorithm and CLIP read clustering for binding site detection. RESULTS We show that differences in the error profiles of PAR-CLIP reads relative to regular transcriptome sequencing reads (RNA-Seq) make a distinct processing advantageous. We examine the alignment accuracy of commonly applied read aligners on 10 simulated PAR-CLIP datasets using different parameter settings and identified the most accurate setup among those read aligners. We demonstrate the performance of the PARA-suite in conjunction with different binding site detection algorithms on several real PAR-CLIP and HITS-CLIP datasets. Our processing pipeline allowed the improvement of both alignment and binding site detection accuracy. AVAILABILITY The PARA-suite toolkit and the PARA-suite aligner are available at https://github.com/akloetgen/PARA-suite and https://github.com/akloetgen/PARA-suite_aligner, respectively, under the GNU GPLv3 license.
منابع مشابه
Bayesian hidden Markov models to identify RNA-protein interaction sites in PAR-CLIP.
The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. There are two key features of the PAR-CLIP experiments: The sequence read tags are likely to form an enriched peak around each RNA-protein interaction site; and the cross-linking procedure is likely to introduce a specific ...
متن کاملCseq-Simulator: A Data Simulator for CLIP-Seq Experiments
CLIP-Seq protocols such as PAR-CLIP, HITS-CLIP or iCLIP allow a genome-wide analysis of protein-RNA interactions. For the processing of the resulting short read data, various tools are utilized. Some of these tools were specifically developed for CLIP-Seq data, whereas others were designed for the analysis of RNA-Seq data. To this date, however, it has not been assessed which of the available t...
متن کاملAn analytical model based on simulation aiming to improve patient flow in a hospital surgical suite
Surgical suits allocate a large amount of expenses to hospitals; on the other hand, they constitute a huge part of hospital revenues. Patient flow optimization in a surgical suite by omitting or reducing bottlenecks which cause loss of time is one of the key solutions in minimizing the patients’ length of stay[1] (LOS) in the system, lowering the expenses, increasing efficiency, and also enhanc...
متن کاملProcessing of Lexical Bundles by Persian Speaking Learners of English
Formulaic sequence (FS) is a general term often used to refer to various types of recurrent clusters. One particular type of FSs common in different registers is lexical bundles (LBs). This study investigated whether LBs are stored and processed as a whole in the mind of language users and whether their functional discourse type has any effect on their processing. To serve these objectives, thr...
متن کاملPARma: identification of microRNA target sites in Argonaute PAR-CLIP data
PARma is a complete data analysis software for AGO-PAR-CLIP experiments to identify target sites of microRNAs as well as the microRNA binding to these sites. It integrates specific characteristics of the experiments into a generative model. The model and a novel pattern discovery tool are iteratively applied to data to estimate seed activity probabilities, cluster confidence scores and to assig...
متن کامل