Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/22018
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Smyth, William F | - |
dc.contributor.advisor | Golding, Brian | - |
dc.contributor.author | Islam, A S M Sohidull | - |
dc.date.accessioned | 2017-10-03T19:56:34Z | - |
dc.date.available | 2017-10-03T19:56:34Z | - |
dc.date.issued | 2017-11 | - |
dc.identifier.uri | http://hdl.handle.net/11375/22018 | - |
dc.description.abstract | A string is a sequence of symbols, usually called letters, drawn from some alphabet. It is one of the most fundamental and important structures in computing, bioinformatics and mathematics. Computer files, contents of a computer memory, network and satellite signals are all instances of strings. The genome of every living thing can be represented by a string drawn from the alphabet {a, c, g, t}. The algorithms processing strings have a wide range of applications such as information retrieval, search engines, data compression, cryptography and bioinformatics. In a DNA sequence the indeterminate symbol {a, c} is used when it is unclear whether a given nucleotide is a or c, We could then say that {a, c} matches another symbol {c, g} which in turn matches {g, t}, but {a, c} certainly does not match {g, t}. The processing of indeterminate strings is much more difficult because of this nontransitivity of matching. Thus a combinatorial understanding of indeterminate strings becomes essential to the development of efficient methods for their processing. With indeterminate strings, as with ordinary ones, the main task is the recognition/computation of patterns called regularities . We are particularly interested in regularities called repeats, whether tandem such as acgacg or nontandem (acgtacg). In this thesis we focus on newly-discovered regularities in strings, especially the enhanced cover array and the Lyndon array, with attention paid to extending the computations to indeterminate strings. Much of this work is necessarily abstract in nature, because the intention is to produce results that are applicable over a wide range of application areas. We will focus on finding algorithms to construct different data structures to represent strings such as cover arrays and Lyndon arrays. The idea of cover comes from strings which are not truly periodic but "almost" periodic in nature. For example abaababa is covered by aba but is not periodic. Similarly the Lyndon array describes the string in another unique way and is used in many fields of string algorithms. These data structures will help us in the field of string processing. As one application of these data structures we will work on "Reverse Engineering"; that is, given data structures derived from of a string, how can we get the string back. Since DNA, RNA and peptide sequences are effectively "strings" with unique properties, we will adapt our algorithms for regular or indeterminate strings to these sequences. Sequence analysis can be used to assign function to genes and proteins by observing the similarities between the compared sequences. Identifying unusual repetitive patterns will aid in the identification of intrinsic features of the sequence such as active sites, gene-structures and regulatory elements. As an application of periodic strings we investigate microsatellites which are short repetitive DNA patterns where repeated substrings are of length 2 to 5. Microsatellites are used in a wide range of studies due to their small size and repetitive nature, and they have played an important role in the identification of numerous important genetic loci. A deeper understanding of the evolutionary and mutational properties of microsatellites is needed, not only to understand how the genome is organized, but also to correctly interpret and use microsatellite data in population genetics studies. | en_US |
dc.language.iso | en | en_US |
dc.subject | Repeats | en_US |
dc.subject | String | en_US |
dc.subject | Bioinformatics | en_US |
dc.subject | Algorithm | en_US |
dc.title | Repeats in Strings and Application in Bioinformatics | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Computational Engineering and Science | en_US |
dc.description.degreetype | Thesis | en_US |
dc.description.degree | Doctor of Philosophy (PhD) | en_US |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Islam_ASMSohidull_2017July_PhD.pdf | 1.4 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.