n South African Computer Journal - Algorithms for clustering expressed sequence tags : the wcd tool : reviewed article
|Article Title||Algorithms for clustering expressed sequence tags : the wcd tool : reviewed article|
|© Publisher:||South African Computer Society (SAICSIT)|
|Journal||South African Computer Journal|
|Publication Date||Jun 2008|
|Pages||51 - 62|
|Keyword(s)||Clustering, D2, Expressed sequence tags, Heuristics, Strings and Suffix arrays|
Understanding which genes are active, and when and why, is an important question for molecular biology. Expressed Sequence Tags (ESTs) are a technology used to explore the transcriptome (a record of this gene activity). ESTs are short fragments of DNA created in the laboratory from mRNA extracted from a cell. The key computational step in their processing is clustering: putting all ESTs associated from the same RNA together. Accurate clustering is quadratic in time in average EST length and number of ESTs, which makes naïve algorithms infeasible for real data sets. The wcd EST clustering system is an open source clustering system that provides efficient implementations of key distance measures, heuristics for speeding up clustering, a pre-clustering booster based on suffix arrays, as well as parallelised implementations based on MPI and Pthreads. This paper presents the underlying algorithms in wcd. The code is available from http://code.google.com/p/wcdest.
Article metrics loading...