Clustering-based identification of clonally-related immunoglobulin gene sequence sets
نویسندگان
چکیده
BACKGROUND Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing methods is providing new avenues for studying this clonal expansion and identifying the factors guiding the generation of antibodies. The identification of groups of rearranged immunoglobulin gene sequences descended from the same rearrangement (clonally-related sets) in very large sets of sequences is facilitated by the availability of immunoglobulin gene sequence alignment and partitioning software that can accurately predict component germline gene, but has required painstaking visual inspection and analysis of sequences. RESULTS We have developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets. The program processes sequences that have been partitioned using iHMMune-align, and uses pairwise comparisons of CDR3 sequences and similarity in IGHV and IGHJ germline gene assignments to construct a distance matrix. Agglomerative hierarchical clustering is then used to identify likely groups of clonally-related sequences. The program is available for download from http://www.cse.unsw.edu.au/~ihmmune/ClonalRelate/ClonalRelate.zip. CONCLUSIONS The method was evaluated on several benchmark datasets and provided a more accurate and considerably faster identification of clonally-related immunoglobulin gene sequences than visual inspection by domain experts.
منابع مشابه
Performance-optimized partitioning of clonotypes from high-throughput immunoglobulin repertoire sequencing data
Motivation: During adaptive immune responses, activated B cells expand and undergo somatic hypermutation of their immunoglobulin (Ig) receptor, forming a clone of diversified cells that can be related back to a common ancestor. Identification of B cell clonotypes from high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data relies on computational analysis. Recently, we pr...
متن کاملMolecular Identification of Rare Clinical Mycobacteria by Application of 16S-23S Spacer Region Sequencing
Objective(s) In addition to several molecular methods and in particular 16S rDNA analysis, the application of a more discriminatory genetic marker, i.e., 16S-23S internal transcribed spacer gene sequence has had a great impact on identification and classification of mycobacteria. In the current study we aimed to apply this sequencing power to conclusive identification of some Iranian clinical ...
متن کاملIdentification of Bifidobacterium Strains Isolated from Fecal Samples of Some Iranian Subjects Using 16SrRNA Gene Sequence Analysis and PCR-based Gene Specific Primers
For the first time in Iran 40 strains of Bifidobacterium were isolated from feces of Iranian subjects. By using phenotypic tests, 18 isolates were identified as Bifidobacterium longum, 10 as Bifidobacterium bifidum and one as Bifidobacterium catenolatum. In order to validate these results and also to identify other isolates that had not been identified by phenotypic tests, two methods of PCR wi...
متن کاملComprehensive Assessment of Potential Multiple Myeloma Immunoglobulin Heavy Chain V-D-J Intraclonal Variation Using Massively Parallel Pyrosequencing
Multiple myeloma (MM) is characterized by the accumulation of malignant plasma cells (PCs) in the bone marrow (BM). MM is viewed as a clonal disorder due to lack of verified intraclonal sequence diversity in the immunoglobulin heavy chain variable region gene (IGHV). However, this conclusion is based on analysis of a very limited number of IGHV subclones and the methodology employed did not per...
متن کاملThe bone marrow of multiple myeloma patients contains B cell populations at different stages of differentiation that are clonally related to the malignant plasma cell
One of the distinguishing features of multiple myeloma (MM) is the proliferation of a clonal plasma cell population in the bone marrow (BM). It is of particular interest that the tumor plasma cells appear to be restricted to the microenvironment of the BM and are rarely detected in the peripheral system, yet the disease is found widely disseminated throughout the axial skeleton. Furthermore, is...
متن کامل