Improving <scp>HLA</scp> typing imputation accuracy and eplet identification with local next‐generation sequencing training data
نویسندگان
چکیده
Assessing donor/recipient HLA compatibility at the eplet level requires second field DNA typings but these are not always available. These can be estimated from lower‐resolution data either manually or with computational tools currently relying, best, on containing typing ambiguities. We gathered NGS 61,393 individuals in 17 French laboratories, for loci A, B, and C (100% of typings), DRB1 DQB1 (95.5%), DQA1 (39.6%), DRB3/4/5, DPB1, DPA1 (10.5%). developed HaploSFHI, a modified iterative maximum likelihood algorithm, to impute low‐ intermediate‐resolution ones. Compared reference HaploStats, HLA‐EMMA, HLA‐Upgrade, HaploSFHI provided more accurate predictions across all two test sets four European‐independent sets. Only could DQA1, solely HaploStats DRB3/4/5 imputations. The improved performance was due our local nonambiguous data. explanations most common imputation errors pinpointed variability low number low‐resolution haplotypes. thus guidance select whom sequencing would optimize incompatibility assessment cost‐effectiveness typing, considering only well‐imputed typing(s) also eplets.
منابع مشابه
Improving Imputation Accuracy in Ordinal Data Using Classification
Tackling missing data is one of the fundamental data pre-processing steps. Data analysis and pattern extraction are affected due to the underlying differences between instances with and without missing data. This is a particular problem with ordinal data, where for example a sample of a population may have all failed to answer a specific question in a questionnaire. The existing methods such as...
متن کاملImproving Performance and Accuracy of Local PCA
Local Principal Component Analysis (LPCA) is one of the popular techniques for dimensionality reduction and data compression of large data sets encountered in computer graphics. The LPCA algorithm is a variant of kmeans clustering where the repetitive classification of high dimensional data points to their nearest cluster leads to long execution times. The focus of this paper is on improving th...
متن کاملImproving speech understanding accuracy with limited training data using multiple language models and multiple understanding models
We aim to improve a speech understanding module with a small amount of training data. A speech understanding module uses a language model (LM) and a language understanding model (LUM). A lot of training data are needed to improve the models. Such data collection is, however, difficult in an actual process of development. We therefore design and develop a new framework that uses multiple LMs and...
متن کاملEffect of Reference Population Size and Imputation Methods on the Accuracy of Imputation in Pure and Mixed Populations
Imputation as a method of creating low-density chips to high-density chips has been introduced to increase the accuracy of genomic selection in animals. In the current study, to investing imputation accuracy, three populations of mixed (scenario 1), pure (scenario 2) and mixed + pure (scenario 3) were simulated using QMSim. Two methods of imputation including Beagle and Flmpute were used fo...
متن کاملImproving Remote Species Identification through Efficient Training Data Collection
Plant species identification and mapping based on remotely-sensed spectral signatures is a challenging task with the potential to contribute enormously to ecological studies. Success in this task rests upon the appropriate collection and use of costly field-based training data, and researchers are in need of ways to improve collection efficiency based on quantitative evidence. Using imaging spe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: HLA: Immune Response Genetics
سال: 2023
ISSN: ['2059-2302', '2059-2310']
DOI: https://doi.org/10.1111/tan.15222