Entropy-driven partitioning of the hierarchical protein space

نویسندگان

  • Nadav Rappoport
  • Amos Stern
  • Nathan Linial
  • Michal Linial
چکیده

MOTIVATION Modern protein sequencing techniques have led to the determination of >50 million protein sequences. ProtoNet is a clustering system that provides a continuous hierarchical agglomerative clustering tree for all proteins. While ProtoNet performs unsupervised classification of all included proteins, finding an optimal level of granularity for the purpose of focusing on protein functional groups remain elusive. Here, we ask whether knowledge-based annotations on protein families can support the automatic unsupervised methods for identifying high-quality protein families. We present a method that yields within the ProtoNet hierarchy an optimal partition of clusters, relative to manual annotation schemes. The method's principle is to minimize the entropy-derived distance between annotation-based partitions and all available hierarchical partitions. We describe the best front (BF) partition of 2 478 328 proteins from UniRef50. Of 4,929,553 ProtoNet tree clusters, BF based on Pfam annotations contain 26,891 clusters. The high quality of the partition is validated by the close correspondence with the set of clusters that best describe thousands of keywords of Pfam. The BF is shown to be superior to naïve cut in the ProtoNet tree that yields a similar number of clusters. Finally, we used parameters intrinsic to the clustering process to enrich a priori the BF's clusters. We present the entropy-based method's benefit in overcoming the unavoidable limitations of nested clusters in ProtoNet. We suggest that this automatic information-based cluster selection can be useful for other large-scale annotation schemes, as well as for systematically testing and comparing putative families derived from alternative clustering methods. AVAILABILITY AND IMPLEMENTATION A catalog of BF clusters for thousands of Pfam keywords is provided at http://protonet.cs.huji.ac.il/bestFront/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stimuli-Magnitude-Adaptive Sample Selection for Data-Driven Haptic Modeling

Data-driven haptic modeling is an emerging technique where contact dynamics are simulated and interpolated based on a generic input-output matching model identified by data sensed from interaction with target physical objects. In data-driven modeling, selecting representative samples from a large set of data in a way that they can efficiently and accurately describe the whole dataset has been a...

متن کامل

DIAGNOSIS OF BREAST LESIONS USING THE LOCAL CHAN-VESE MODEL, HIERARCHICAL FUZZY PARTITIONING AND FUZZY DECISION TREE INDUCTION

Breast cancer is one of the leading causes of death among women. Mammography remains today the best technology to detect breast cancer, early and efficiently, to distinguish between benign and malignant diseases. Several techniques in image processing and analysis have been developed to address this problem. In this paper, we propose a new solution to the problem of computer aided detection and...

متن کامل

A Thermodynamic Study of the Interaction between Urease and Copper Ions

A thermodynamic study of copper ions by jack bean urease (JBU) was carried out at two temperatures of 27 and 37?C in Tris buffer (30 mM; pH=7.0) using an isothermal titration calorimetry. There is a set of twelve identical and non-interacting binding sites for copper ions. The intrinsic dissociation equilibrium constant and the molar enthalpy of binding are 285 µM and ?15.2 kJ/mol at 27?C and 3...

متن کامل

Preference-Driven Hierarchical Hardware/Software Partitioning

In this paper, we present a hierarchical evolutionary approach to hardware/software partitioning for real-time embedded systems. In contrast to most of previous approaches, we apply a hierarchical structure and dynamically determine the granularity of tasks and hardware modules to adaptively optimize the solution while keeping the search space as small as possible. Two new search operators are ...

متن کامل

Irreversibility Analysis of MHD Buoyancy-Driven Variable Viscosity Liquid Film along an Inclined Heated Plate Convective Cooling

Analysis of intrinsic irreversibility and heat transfer in a buoyancy-driven changeable viscosity liquid along an incline heated wall with convective cooling taking into consideration the heated isothermal and isoflux wall is investigated. By Newton’s law of cooling, we assumed the free surface exchange heat with environment and fluid viscosity is exponentially dependent on temperature. Appropr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014