Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/21222
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Smyth, William F. | - |
dc.contributor.author | Wang, Shu | - |
dc.date.accessioned | 2017-03-22T20:53:07Z | - |
dc.date.available | 2017-03-22T20:53:07Z | - |
dc.date.issued | 2005-01 | - |
dc.identifier.uri | http://hdl.handle.net/11375/21222 | - |
dc.description | Title: Pattern-Matching Algorithms on Indeterminate Strings, Author: Shu Wang, Location: Thode | en_US |
dc.description.abstract | <p>In computing, strings are sequences of simple elements (letters) drawn from some alphabet. A string is one of the most basic and important data structures in computer science, it exists everywhere in computer systems. Disk files, contents of memory, source and object code of computer programs, e-mail messages are all examples of strings. Also in bioinformatics, nucleotides in DNA can be viewed as strings drawn from an alphabet of four basic symbols- A, C, G and T. Algorithms that deal with strings are correspondingly very important in the field of computer science as well as in bioinformatics, data compression, cryptanalysis and other scientific areas.</p> <p>A pattern-matching algorithm is one of the most fundamental string algorithms. Simply speaking, it outputs the occurrences of a string (called the pattern) within another string (called the text). Exact pattern-matching algorithms have been extensively studied in the last three decades and many algorithms have been proposed. Among them are two well known algorithms, the Knuth-Morris-Pratt (KMP) algorithm [KMP77], with good worst-case performance, and the Boyer-Moore (BM) algorithm [BM77], with good average-case performance. A recent attempt to combine the virtues of the two algorithms results in the Franek-Jennings-Smyth (FJS) algorithm [Jen02, FJS05b, FJS05a], which has both excellent average-case and worst-case performance.</p> <p>Indeterminate strings are a new class of strings proposed in [HS03] driven by the increasingly common application of string algorithms on biological (DNA) sequences. Unlike letters in a normal alphabet, an indeterminate letter matches a set of letters (under certain constraints) rather than just one during pattern-matching. Perhaps the most straightforward example of an indeterminate letter is the don't care letter '*', that matches all letters in the alphabet. The main purpose of an indeterminate string is to increase the flexibility of pattern-matching. It can also be seen as a model of a DNA sequence with a polymorphism property, which is a commonplace in molecular biology.</p> <p>An indeterminate string is a relatively new idea and not too many known algorithms work on it, except for a particular version of the ShiftOr algorithm [WM92]. However as we will see, even this version can only handle a special case of indeterminate pattern-matching. Faster and more general indeterminate pattern-matching algorithms are desired. In this thesis we first give rigorous definitions of indeterminate strings and then develop our new indeterminate pattern-matching algorithms, adapted and modified from existing exact pattern-matching algorithms. There are many different constraints and also different models of patten-matching. We present our solution with all these variations in order of increasing complexity. We present our rationale of development and investigate the possibility of developing new algorithms from different categories of existing determinate algorithms. After describing our algorithms we conduct comprehensive experiments and compare the results carefully. In the last chapter we present our conclusions and point out the directions of future research.</p> | en_US |
dc.language.iso | en | en_US |
dc.title | Pattern-Matching Algorithms on Indeterminate Strings | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Statistics | en_US |
dc.description.degreetype | Thesis | en_US |
dc.description.degree | Master of Science (MS) | en_US |
Appears in Collections: | Digitized Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Wang_Shu_2005_01_master.pdf | Title: Pattern-Matching Algorithms on Indeterminate Strings, Author: Shu Wang, Location: Thode | 16.14 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.