ANALYSIS AND COMPARISON OF USEARCH AND DNACLUST SOFTWARE PACKAGES
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Over the past several years, new DNA sequencing technologies have led to a great in-
crease in the quantity of biological sequence data that can be generated. Typically there
may be millions or even billions of short reads sequences of a few hundred base pairs
that are to some degree redundant: the data fall naturally into clusters of sequences
that are highly similar to each other. In order to reduce the time required for analysis
of the data, it therefore becomes of interest to compute representatives of these clusters,
based on some definition of similarity.
In this thesis we examine two clustering software packages, USEARCH and DNACLUST,
that seek to perform this clustering task efficiently. We provide an overview of the techniques used by these two packages; we compare and evaluate them both from a methodological and experimental perspective, and draw conclusions about their effectiveness and utility.