Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

COMPUTING REPETITIONS IN STRINGS: CURRENT ALGORITHMS & THE COMBINATORICS OF FUTURE ONES.

dc.contributor.advisorSmyth, William F.en_US
dc.contributor.authorKopylov, Evgueniaen_US
dc.contributor.departmentComputing and Softwareen_US
dc.date.accessioned2014-06-18T16:45:13Z
dc.date.available2014-06-18T16:45:13Z
dc.date.created2011-05-25en_US
dc.date.issued2010en_US
dc.description.abstract<p><em>Repetition is the reality and seriousness of life.</em><br /><em> - Soren Kierkegaard</em><br />The study of repetitions exhibits roots in many modern sciences - sinusoidal waves in physics (smooth repetitive oscillations such as the electromagnetic spectrum), highly repetitive DNA in biology (tandem repeats, satellite DNA), regularities of ciphertexts in cryptography and the periodicity of sounds and sequences in music. A string on a given alphabet ∑ provides the simplest common representation of this underlying property. A <strong><em>repetition</em></strong> defined on a string consists of two or more adjacent identical substrings (e.g. abab or aaaa).<br />A particular problem regarding repetitions is to count the number of different repetitions in a string. Conventional approaches execute in ϴ(<em>n</em>log<em>n</em>) time (Crochemore, 1981; Apostolico and Preparata, 1983; Main and Lorentz, 1984) and employ computationally heavy preprocessing. An ϴ(<em>n</em>) time algorithm introduced in 2000 (Kolpakov and Kucherov, 2000) prevailed over its slower predecessors by succinctly encoding all repetitions as runs. A <em>run</em> is a maximally periodic (nonextendible) substring. For example, the string <em>abaabaabb</em> encodes 3 runs - (<em>aba</em>)<sup>2</sup>(<em>ab</em>), <em>aa</em> (twice) and <em>bb</em>. The first of these identifies three repetitions - (<em>aba</em>)<sup>2</sup>, (<em>baa</em>)<sup>2</sup> and (<em>aab</em>)<sup>2</sup> In the early part of this thesis, we survey current algorithms for computing all repetitions.</p> <p>Brute force is the essential drawback of previous attempts for detecting repetitions, despite evidence and proof that their occurrence in strings is sparse (Puglisi and Simpson, 2008). By establishing combinatorial constraints to predict the expected sparsity of runs, extant preprocessing may be reformatted to exclude redundant computations. In (Fan <em>et al.</em>, 2006), it was shown that if two runs begin at the same position <em>i</em>, consequently no runs begin at some neighbouring position<em> i</em>+<em>k</em>. This is the fundamental idea behind our combinatorial work, in which we provide well substantiated conjectures, some of which are supported by proofs, implying that three neighbouring squares in a string force a trivial breakdown of the substring beginning at position <em>i</em> into repetitions of a small period.<br /><br /></p>en_US
dc.description.degreeMaster of Science (MS)en_US
dc.identifier.otheropendissertations/4192en_US
dc.identifier.other5210en_US
dc.identifier.other2030274en_US
dc.identifier.urihttp://hdl.handle.net/11375/9033
dc.subjectComputing and Softwareen_US
dc.subjectComputer Engineeringen_US
dc.subjectComputer Sciencesen_US
dc.subjectSoftware Engineeringen_US
dc.subjectComputer Engineeringen_US
dc.titleCOMPUTING REPETITIONS IN STRINGS: CURRENT ALGORITHMS & THE COMBINATORICS OF FUTURE ONES.en_US
dc.typethesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fulltext.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format