Please use this identifier to cite or link to this item:
|Title:||ANALYSIS AND COMPARISON OF USEARCH AND DNACLUST SOFTWARE PACKAGES|
|Advisor:||Smyth, W. F.|
|Abstract:||Over the past several years, new DNA sequencing technologies have led to a great in- crease in the quantity of biological sequence data that can be generated. Typically there may be millions or even billions of short reads sequences of a few hundred base pairs that are to some degree redundant: the data fall naturally into clusters of sequences that are highly similar to each other. In order to reduce the time required for analysis of the data, it therefore becomes of interest to compute representatives of these clusters, based on some definition of similarity. In this thesis we examine two clustering software packages, USEARCH and DNACLUST, that seek to perform this clustering task efficiently. We provide an overview of the techniques used by these two packages; we compare and evaluate them both from a methodological and experimental perspective, and draw conclusions about their effectiveness and utility.|
|Appears in Collections:||Open Access Dissertations and Theses|
Files in This Item:
|Shafqat_Raazia_2015August_MSc.pdf||1.08 MB||Adobe PDF||View/Open|
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.