pfam database

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

Journal: Progress in Biological Sciences 2017

Hamid Pezeshk, Vahid Rezaei Tabar,

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources

2010

Guy Cochrane Michael Y. Galperin

The current issue of Nucleic Acids Research includes descriptions of 58 new and 73 updated data resources. The accompanying online Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, now lists 1230 carefully selected databases covering various aspects of molecular and cell biology. While most data resource descriptions remain very brief, the issue includes several l...

متن کامل

Pseudofam: the pseudogene families database

2009

Hugo Y. K. Lam Ekta Khurana Gang Fang Philip Cayting Nicholas Carriero Kei-Hoi Cheung Mark Gerstein

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125,000 pseudogenes identified from 10 eukaryotic genomes and aligne...

متن کامل

Computational prediction of SEG (single exon gene) function in humans.

Journal: :Frontiers in bioscience : a journal and virtual library 2005

Meena K Sakharkar Vincent T K Chow Kingshuk Ghosh Iti Chaturvedi Pern Chern Lee Sundara Perumal Bagavathi Paul Shapshak Subramanian Subbiah Pandjassarame Kangueane

Human genes are often interrupted by non-coding, intragenic sequences called introns. Hence, the gene sequence is divided into exons (coding segments) and introns (non-coding segments). Consequently, a majority of them are multi exon genes (MEG). However, a considerable amount of single exon genes (SEG) are present in the human genome (approximately 12%). This amount is sizeable and it is impor...

متن کامل

Automatic Identification and Classification of Protein Domains

2005

Elon Portugaly Nathan Linial Michal Linial

Motivation: Proteins are comprised of one or several domains. Such domains can be classified into families according to their biological function. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational tools for large-scale determination of protein domains and their boundaries. The present paper addresses the challenge of developing computat...

متن کامل

PICUPP: Protein Interaction Classification by Unlikely Profile Pair

2003

Byung-Hoon Park George Ostrouchov Gong-Xin Yu Al Geist Andrey Gorin Nagiza F. Samatova

A computational approach that infers protein-protein interactions from genome sequences is proposed in this paper. It is based on our recent observation that protein-protein interactions can be identified by a set of “unusual” protein-profile pairs in experimentally determined protein interactions. A pair of proteinprofiles is considered to be unusual if its occurrence in the given data is stat...

متن کامل

Pfam: multiple sequence alignments and HMM-profiles of protein domains

Journal: :Nucleic acids research 1998

Erik L. L. Sonnhammer Sean R. Eddy Ewan Birney Alex Bateman Richard Durbin

Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.0 of Pfam contains 52...

متن کامل

BioPD: a web-based information center for bioactive peptides.

Journal: :Regulatory peptides 2004

Lei Shi Qipeng Zhang Wei Rui Ming Lu Xia Jing Tong Shang Jian Tang

Bioactive peptide database (BioPD) is a web-based knowledge base that contains more than 1100 protein sequences from human, mouse and rat, which are putative or are known to be bioactive peptides. In addition to peptide sequences and the annotation, the database also contains gene sequences with annotation, protein interaction and disease data related to the peptides. Each entry has as many ref...

متن کامل

Broadening Pfam Protein Sequence Annotations

Journal: :Nature Precedings 2009

متن کامل

Testing Statistical Hypothesis on Random Trees and Applications to the Protein Classification Problem

2006

Jorge R. Busch Pablo A. Ferrari Ana Georgina Flesia Ricardo Fraiman Sebastian P. Grynberg Florencia Leonardi

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VL...

متن کامل