Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/9033
Title: COMPUTING REPETITIONS IN STRINGS: CURRENT ALGORITHMS & THE COMBINATORICS OF FUTURE ONES.
Authors: Kopylov, Evguenia
Advisor: Smyth, William F.
Department: Computing and Software
Keywords: Computing and Software;Computer Engineering;Computer Sciences;Software Engineering;Computer Engineering
Publication Date: 2010
Abstract: <p><em>Repetition is the reality and seriousness of life.</em><br /><em> - Soren Kierkegaard</em><br />The study of repetitions exhibits roots in many modern sciences - sinusoidal waves in physics (smooth repetitive oscillations such as the electromagnetic spectrum), highly repetitive DNA in biology (tandem repeats, satellite DNA), regularities of ciphertexts in cryptography and the periodicity of sounds and sequences in music. A string on a given alphabet ∑ provides the simplest common representation of this underlying property. A <strong><em>repetition</em></strong> defined on a string consists of two or more adjacent identical substrings (e.g. abab or aaaa).<br />A particular problem regarding repetitions is to count the number of different repetitions in a string. Conventional approaches execute in ϴ(<em>n</em>log<em>n</em>) time (Crochemore, 1981; Apostolico and Preparata, 1983; Main and Lorentz, 1984) and employ computationally heavy preprocessing. An ϴ(<em>n</em>) time algorithm introduced in 2000 (Kolpakov and Kucherov, 2000) prevailed over its slower predecessors by succinctly encoding all repetitions as runs. A <em>run</em> is a maximally periodic (nonextendible) substring. For example, the string <em>abaabaabb</em> encodes 3 runs - (<em>aba</em>)<sup>2</sup>(<em>ab</em>), <em>aa</em> (twice) and <em>bb</em>. The first of these identifies three repetitions - (<em>aba</em>)<sup>2</sup>, (<em>baa</em>)<sup>2</sup> and (<em>aab</em>)<sup>2</sup> In the early part of this thesis, we survey current algorithms for computing all repetitions.</p> <p>Brute force is the essential drawback of previous attempts for detecting repetitions, despite evidence and proof that their occurrence in strings is sparse (Puglisi and Simpson, 2008). By establishing combinatorial constraints to predict the expected sparsity of runs, extant preprocessing may be reformatted to exclude redundant computations. In (Fan <em>et al.</em>, 2006), it was shown that if two runs begin at the same position <em>i</em>, consequently no runs begin at some neighbouring position<em> i</em>+<em>k</em>. This is the fundamental idea behind our combinatorial work, in which we provide well substantiated conjectures, some of which are supported by proofs, implying that three neighbouring squares in a string force a trivial breakdown of the substring beginning at position <em>i</em> into repetitions of a small period.<br /><br /></p>
URI: http://hdl.handle.net/11375/9033
Identifier: opendissertations/4192
5210
2030274
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File SizeFormat 
fulltext.pdf
Open Access
1.49 MBAdobe PDFView/Open
Show full item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue