Efficient clustering of identity-by-descent between multiple individuals

نویسندگان

  • Yu Qian
  • Brian L. Browning
  • Sharon R. Browning
چکیده

MOTIVATION Most existing identity-by-descent (IBD) detection methods only consider haplotype pairs; less attention has been paid to considering multiple haplotypes simultaneously, even though IBD is an equivalence relation on haplotypes that partitions a set of haplotypes into IBD clusters. Multiple-haplotype IBD clusters may have advantages over pairwise IBD in some applications, such as IBD mapping. Existing methods for detecting multiple-haplotype IBD clusters are often computationally expensive and unable to handle large samples with thousands of haplotypes. RESULTS We present a clustering method, efficient multiple-IBD, which uses pairwise IBD segments to infer multiple-haplotype IBD clusters. It expands clusters from seed haplotypes by adding qualified neighbors and extends clusters across sliding windows in the genome. Our method is an order of magnitude faster than existing methods and has comparable performance with respect to the quality of clusters it uncovers. We further investigate the potential application of multiple-haplotype IBD clusters in association studies by testing for association between multiple-haplotype IBD clusters and low-density lipoprotein cholesterol in the Northern Finland Birth Cohort. Using our multiple-haplotype IBD cluster approach, we found an association with a genomic interval covering the PCSK9 gene in these data that is missed by standard single-marker association tests. Previously published studies confirm association of PCSK9 with low-density lipoprotein. AVAILABILITY AND IMPLEMENTATION Source code is available under the GNU Public License http://cs.au.dk/~qianyuxx/EMI/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-locus and three-locus gene identity by descent in pedigrees.

Although there have been several mathematical formulations of multilocus segregation, multilocus gene identity by descent in pedigrees has been little considered. Here we present a computationally feasible algorithm for the computation of two-locus kinship for individuals between whom there may be multiple complex relationships, and use it to investigate patterns of two-locus gene identity by d...

متن کامل

Provably secure and efficient identity-based key agreement protocol for independent PKGs using ECC

Key agreement protocols are essential for secure communications in open and distributed environments. Recently, identity-based key agreement protocols have been increasingly researched because of the simplicity of public key management. The basic idea behind an identity-based cryptosystem is that a public key is the identity (an arbitrary string) of a user, and the corresponding private key is ...

متن کامل

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, ...

متن کامل

The role of Identity styles in prediction of justifying Extra Marital relationship in Married Individuals

The aim of the present study was to investigate the role of identity styles in prediction of justifying extra-marital relationships in married individuals. Current research method was correlation. The Statistical population of research were all married individual in Ardabil city in 2016. The sample of research  were  consisted of 140 married individual who were selected by available sampling.  ...

متن کامل

A fast, powerful method for detecting identity by descent.

We present a method, fastIBD, for finding tracts of identity by descent (IBD) between pairs of individuals. FastIBD can be applied to thousands of samples across genome-wide SNP data and is significantly more powerful for finding short tracts of IBD than existing methods for finding IBD tracts in such data. We show that fastIBD can detect facets of population structure that are not revealed by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 30 7  شماره 

صفحات  -

تاریخ انتشار 2014