Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/22018
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSmyth, William F-
dc.contributor.advisorGolding, Brian-
dc.contributor.authorIslam, A S M Sohidull-
dc.date.accessioned2017-10-03T19:56:34Z-
dc.date.available2017-10-03T19:56:34Z-
dc.date.issued2017-11-
dc.identifier.urihttp://hdl.handle.net/11375/22018-
dc.description.abstractA string is a sequence of symbols, usually called letters, drawn from some alphabet. It is one of the most fundamental and important structures in computing, bioinformatics and mathematics. Computer files, contents of a computer memory, network and satellite signals are all instances of strings. The genome of every living thing can be represented by a string drawn from the alphabet {a, c, g, t}. The algorithms processing strings have a wide range of applications such as information retrieval, search engines, data compression, cryptography and bioinformatics. In a DNA sequence the indeterminate symbol {a, c} is used when it is unclear whether a given nucleotide is a or c, We could then say that {a, c} matches another symbol {c, g} which in turn matches {g, t}, but {a, c} certainly does not match {g, t}. The processing of indeterminate strings is much more difficult because of this nontransitivity of matching. Thus a combinatorial understanding of indeterminate strings becomes essential to the development of efficient methods for their processing. With indeterminate strings, as with ordinary ones, the main task is the recognition/computation of patterns called regularities . We are particularly interested in regularities called repeats, whether tandem such as acgacg or nontandem (acgtacg). In this thesis we focus on newly-discovered regularities in strings, especially the enhanced cover array and the Lyndon array, with attention paid to extending the computations to indeterminate strings. Much of this work is necessarily abstract in nature, because the intention is to produce results that are applicable over a wide range of application areas. We will focus on finding algorithms to construct different data structures to represent strings such as cover arrays and Lyndon arrays. The idea of cover comes from strings which are not truly periodic but "almost" periodic in nature. For example abaababa is covered by aba but is not periodic. Similarly the Lyndon array describes the string in another unique way and is used in many fields of string algorithms. These data structures will help us in the field of string processing. As one application of these data structures we will work on "Reverse Engineering"; that is, given data structures derived from of a string, how can we get the string back. Since DNA, RNA and peptide sequences are effectively "strings" with unique properties, we will adapt our algorithms for regular or indeterminate strings to these sequences. Sequence analysis can be used to assign function to genes and proteins by observing the similarities between the compared sequences. Identifying unusual repetitive patterns will aid in the identification of intrinsic features of the sequence such as active sites, gene-structures and regulatory elements. As an application of periodic strings we investigate microsatellites which are short repetitive DNA patterns where repeated substrings are of length 2 to 5. Microsatellites are used in a wide range of studies due to their small size and repetitive nature, and they have played an important role in the identification of numerous important genetic loci. A deeper understanding of the evolutionary and mutational properties of microsatellites is needed, not only to understand how the genome is organized, but also to correctly interpret and use microsatellite data in population genetics studies.en_US
dc.language.isoenen_US
dc.subjectRepeatsen_US
dc.subjectStringen_US
dc.subjectBioinformaticsen_US
dc.subjectAlgorithmen_US
dc.titleRepeats in Strings and Application in Bioinformaticsen_US
dc.typeThesisen_US
dc.contributor.departmentComputational Engineering and Scienceen_US
dc.description.degreetypeThesisen_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Islam_ASMSohidull_2017July_PhD.pdf
Open Access
1.4 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue