Pseudofam: the pseudogene families database

نویسندگان

  • Hugo Y. K. Lam
  • Ekta Khurana
  • Gang Fang
  • Philip Cayting
  • Nicholas Carriero
  • Kei-Hoi Cheung
  • Mark Gerstein
چکیده

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network analysis of pseudogene-gene relationships: from pseudogene evolution to their functional potentials.

Pseudogenes are fossil relatives of genes. Pseudogenes have long been thought of as "junk DNAs", since they do not code proteins in normal tissues. Although most of the human pseudogenes do not have noticeable functions, ∼20% of them exhibit transcriptional activity. There has been evidence showing that some pseudogenes adopted functions as lncRNAs and work as regulators of gene expression. Fur...

متن کامل

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation

The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to add...

متن کامل

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.

Pseudogenes are non-functioning copies of genes in genomic DNA, which may either result from reverse transcription from an mRNA transcript (processed pseudogenes) or from gene duplication and subsequent disablement (non-processed pseudogenes). As pseudogenes are apparently 'dead', they usually have a variety of obvious disablements (e.g., insertions, deletions, frameshifts and truncations) rela...

متن کامل

Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the...

متن کامل

Digging for Dead Genes: An Analysis of the Characteristics and Distribution of the Pseudogene Population in the Ribbon Worm Genome

Pseudogenes are non-functioning copies of genes in genomic DNA, which may either result from reverse transcription from a messenger RNA transcript (termed processed pseudogenes) or from gene duplication and subsequent disablement (non-processed pseudogenes). As pseudogenes are apparently ‘dead’, they usually have a variety of disablements (e.g. insertions, deletions, frameshifts and truncations...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2009