Repeats in Strings and Application in Bioinformatics

Islam, A S M Sohidull

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/22018

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Smyth, William F	-
dc.contributor.advisor	Golding, Brian	-
dc.contributor.author	Islam, A S M Sohidull	-
dc.date.accessioned	2017-10-03T19:56:34Z	-
dc.date.available	2017-10-03T19:56:34Z	-
dc.date.issued	2017-11	-
dc.identifier.uri	http://hdl.handle.net/11375/22018	-
dc.description.abstract	A string is a sequence of symbols, usually called letters, drawn from some alphabet. It is one of the most fundamental and important structures in computing, bioinformatics and mathematics. Computer files, contents of a computer memory, network and satellite signals are all instances of strings. The genome of every living thing can be represented by a string drawn from the alphabet {a, c, g, t}. The algorithms processing strings have a wide range of applications such as information retrieval, search engines, data compression, cryptography and bioinformatics. In a DNA sequence the indeterminate symbol {a, c} is used when it is unclear whether a given nucleotide is a or c, We could then say that {a, c} matches another symbol {c, g} which in turn matches {g, t}, but {a, c} certainly does not match {g, t}. The processing of indeterminate strings is much more difficult because of this nontransitivity of matching. Thus a combinatorial understanding of indeterminate strings becomes essential to the development of efficient methods for their processing. With indeterminate strings, as with ordinary ones, the main task is the recognition/computation of patterns called regularities . We are particularly interested in regularities called repeats, whether tandem such as acgacg or nontandem (acgtacg). In this thesis we focus on newly-discovered regularities in strings, especially the enhanced cover array and the Lyndon array, with attention paid to extending the computations to indeterminate strings. Much of this work is necessarily abstract in nature, because the intention is to produce results that are applicable over a wide range of application areas. We will focus on finding algorithms to construct different data structures to represent strings such as cover arrays and Lyndon arrays. The idea of cover comes from strings which are not truly periodic but "almost" periodic in nature. For example abaababa is covered by aba but is not periodic. Similarly the Lyndon array describes the string in another unique way and is used in many fields of string algorithms. These data structures will help us in the field of string processing. As one application of these data structures we will work on "Reverse Engineering"; that is, given data structures derived from of a string, how can we get the string back. Since DNA, RNA and peptide sequences are effectively "strings" with unique properties, we will adapt our algorithms for regular or indeterminate strings to these sequences. Sequence analysis can be used to assign function to genes and proteins by observing the similarities between the compared sequences. Identifying unusual repetitive patterns will aid in the identification of intrinsic features of the sequence such as active sites, gene-structures and regulatory elements. As an application of periodic strings we investigate microsatellites which are short repetitive DNA patterns where repeated substrings are of length 2 to 5. Microsatellites are used in a wide range of studies due to their small size and repetitive nature, and they have played an important role in the identification of numerous important genetic loci. A deeper understanding of the evolutionary and mutational properties of microsatellites is needed, not only to understand how the genome is organized, but also to correctly interpret and use microsatellite data in population genetics studies.	en_US
dc.language.iso	en	en_US
dc.subject	Repeats	en_US
dc.subject	String	en_US
dc.subject	Bioinformatics	en_US
dc.subject	Algorithm	en_US
dc.title	Repeats in Strings and Application in Bioinformatics	en_US
dc.type	Thesis	en_US
dc.contributor.department	Computational Engineering and Science	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Doctor of Philosophy (PhD)	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Islam_ASMSohidull_2017July_PhD.pdf Open Access		1.4 MB	Adobe PDF	View/Open

Show simple item record