Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing
نویسندگان
چکیده
High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion-Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.
منابع مشابه
High-throughput sequencing of immunoglobulin genes: Life without a template
Immunoglobulin (that is, antibody) and T cell receptor genes are created through somatic gene rearrangement from gene segment libraries. Immunoglobulin genes are further diversified by somatic hypermutation and selection during the immune response. Studying the repertoires of these genes yields valuable insights into immune system function in infections, aging, autoimmune diseases and cancers. ...
متن کاملCLONING AND SEQUENCING OF A MITOCHONDRIAL AUTOANTIGEN WITH IMMUNOGLOBULIN G FROM PATIENTS WITH MULTIPLE SCLEROSIS
Multiple Sclerosis (MS) is a chronic neurological disease of the central nervous system (CNS), characterised by a cellular immune response in early stages and demyelination of the CNS later. Although the cause of MS is unknown, there is much evidence that points to MS as an autoimmune disease. To test the hypotheses that an Autoantigen is involved in MS, we screened a ?gt11 human foetal spinal ...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کاملThe Evolution and Development of the Antibody Repertoire
Approximately 500 million years ago (1), vertebrates developed the ability to generate a highly diverse repertoire of immunoglobulins (Igs). These highly versatile proteins serve as both effector molecules and as receptors for antigen ligands. As soluble effectors, Igs can activate and fix complement and they can bind Fc receptors on the surfaces of granulocytes, monocytes, platelets, and other...
متن کاملpRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires
UNLABELLED Driven by dramatic technological improvements, large-scale characterization of lymphocyte receptor repertoires via high-throughput sequencing is now feasible. Although promising, the high germline and somatic diversity, especially of B-cell immunoglobulin repertoires, presents challenges for analysis requiring the development of specialized computational pipelines. We developed the R...
متن کامل