An overview of the wcd EST clustering tool
نویسندگان
چکیده
UNLABELLED The wcd system is an open source tool for clustering expressed sequence tags (EST) and other DNA and RNA sequences. wcd allows efficient all-versus-all comparison of ESTs using either the d(2) distance function or edit distance, improving existing implementations of d(2). It supports merging, refinement and reclustering of clusters. It is 'drop in' compatible with the StackPack clustering package. wcd supports parallelization under both shared memory and cluster architectures. It is distributed with an EMBOSS wrapper allowing wcd to be installed as part of an EMBOSS installation (and so provided by a web server). AVAILABILITY wcd is distributed under a GPL licence and is available from http://code.google.com/p/wcdest. SUPPLEMENTARY INFORMATION Additional experimental results. The wcd manual, a companion paper describing underlying algorithms, and all datasets used for experimentation can also be found at www.bioinf.wits.ac.za/~scott/wcdsupp.html.
منابع مشابه
Algorithms for clustering expressed sequence tags: the wcd tool
Understanding which genes are active, and when and why, is an important question for molecular biology. Expressed Sequence Tags (ESTs) are a technology used to explore the transcriptome (a record of this gene activity). ESTs are short fragments of DNA created in the laboratory from mRNA extracted from a cell. The key computational step in their processing is clustering : putting all ESTs associ...
متن کاملAn implementation of the d distance function for DNA sequences: The wcd d EST clustering algorithm
This report gives a skeleton description of the d2 algorithm used for the clustering of expressed sequence tags (ESTs) in the wcd program. It describes how the algorithm works and why some design decisions were made. No experimental evidence is reported here. This is subject of ongoing research.
متن کاملA Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family
BACKGROUND Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact th...
متن کاملPEACE: Parallel Environment for Assembly and Clustering of Gene Expression
We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or gre...
متن کاملThe Wearable Cardioverter/Defibrillator - Toy Or Tool?
After the success story of implantable cardioverter/defibrillator systems, prevention of sudden cardiac death (SCD) remains one of the main duties in cardiology. For patients with unkown or transient risk profile for SCD, a wearable cardioverter/defibrillator (WCD) has been established for temporary and effective prevention of sudden arrhythmic death. Several studies have shown safety and effic...
متن کامل