COMPUTING REPETITIONS IN STRINGS: CURRENT ALGORITHMS & THE COMBINATORICS OF FUTURE ONES.

Kopylov, Evguenia

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/9033

Title:	COMPUTING REPETITIONS IN STRINGS: CURRENT ALGORITHMS & THE COMBINATORICS OF FUTURE ONES.
Authors:	Kopylov, Evguenia
Advisor:	Smyth, William F.
Department:	Computing and Software
Keywords:	Computing and Software;Computer Engineering;Computer Sciences;Software Engineering;Computer Engineering
Publication Date:	2010
Abstract:	<p><em>Repetition is the reality and seriousness of life.</em><br /><em> - Soren Kierkegaard</em><br />The study of repetitions exhibits roots in many modern sciences - sinusoidal waves in physics (smooth repetitive oscillations such as the electromagnetic spectrum), highly repetitive DNA in biology (tandem repeats, satellite DNA), regularities of ciphertexts in cryptography and the periodicity of sounds and sequences in music. A string on a given alphabet ∑ provides the simplest common representation of this underlying property. A <strong><em>repetition</em></strong> defined on a string consists of two or more adjacent identical substrings (e.g. abab or aaaa).<br />A particular problem regarding repetitions is to count the number of different repetitions in a string. Conventional approaches execute in ϴ(<em>n</em>log<em>n</em>) time (Crochemore, 1981; Apostolico and Preparata, 1983; Main and Lorentz, 1984) and employ computationally heavy preprocessing. An ϴ(<em>n</em>) time algorithm introduced in 2000 (Kolpakov and Kucherov, 2000) prevailed over its slower predecessors by succinctly encoding all repetitions as runs. A <em>run</em> is a maximally periodic (nonextendible) substring. For example, the string <em>abaabaabb</em> encodes 3 runs - (<em>aba</em>)<sup>2</sup>(<em>ab</em>), <em>aa</em> (twice) and <em>bb</em>. The first of these identifies three repetitions - (<em>aba</em>)<sup>2</sup>, (<em>baa</em>)<sup>2</sup> and (<em>aab</em>)<sup>2</sup> In the early part of this thesis, we survey current algorithms for computing all repetitions.</p> <p>Brute force is the essential drawback of previous attempts for detecting repetitions, despite evidence and proof that their occurrence in strings is sparse (Puglisi and Simpson, 2008). By establishing combinatorial constraints to predict the expected sparsity of runs, extant preprocessing may be reformatted to exclude redundant computations. In (Fan <em>et al.</em>, 2006), it was shown that if two runs begin at the same position <em>i</em>, consequently no runs begin at some neighbouring position<em> i</em>+<em>k</em>. This is the fundamental idea behind our combinatorial work, in which we provide well substantiated conjectures, some of which are supported by proofs, implying that three neighbouring squares in a string force a trivial breakdown of the substring beginning at position <em>i</em> into repetitions of a small period.<br /><br /></p>
URI:	http://hdl.handle.net/11375/9033
Identifier:	opendissertations/4192 5210 2030274
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Size	Format
fulltext.pdf Open Access	1.49 MB	Adobe PDF	View/Open

Show full item record