Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/27555
Title: Efficient Implementation & Application of Maximal String Covering Algorithm
Other Titles: MAXIMAL COVER ALGORITHM IMPLEMENTATION
Authors: Koponen, Holly
Advisor: Smyth, William F.
Mhaskar, Neerja
Department: Computing and Software
Keywords: String;Maximal Cover;Repeats;MAXCOVER;protein sequence;pattern matching;NE repeat;MUMmer
Publication Date: 2022
Abstract: This thesis describes the development and application of the new software MAXCOVER that computes maximal covers and non-extendible repeats (a.k.a. “maximal repeats”). A string is a finite array x[1..n] of elements chosen from a set of totally ordered symbols called an alphabet. A repeat is a substring that occurs at least twice in x. A repeat is left/right extendible if every occurrence is preceded/followed by the same symbol; otherwise, it is non-left/non-right extendible (NLE/NRE). A non-extendible (NE) repeat is both NLE and NRE. A repeat covers a position i if x[i] lies within the repeat. A maximal cover (a.k.a. “optimal cover”) is a repeat that covers the most positions in x. For simplicity, we first describe a quadratic O(n2) implementation of MAXCOVER to compute all maximal covers of a given string based on the pseudocode given in [1]. Then, we consider the logarithmic O(n log n) pseudocode in [1], in which we identify several errors. We leave a complete correction and implementation for future work. Instead, we propose two improved quadratic algorithms that, shown through experiments, will execute in linear time for the average case. We perform a benchmark evaluation of MAXCOVER’s performance and demonstrate its value to biologists in the protein context [2]. To do so, we develop an extension of MAXCOVER for the closely related task of computing NE repeats. Then, we compare MAXCOVER to the repeat-match feature of the well-known MUMmer software [3] (600+ citations). We determine that MAXCOVER is an order-of-magnitude faster than MUMmer with much lower space requirements. We also show that MAXCOVER produces a more compact, exact, and user-friendly output that specifies the repeats. Availability: Open source code, binaries, and test data are available on Github at https://github.com/hollykoponen/MAXCOVER. Currently runs on Linux, untested on other OS.
URI: http://hdl.handle.net/11375/27555
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Koponen_Holly_K_2022April_MSc.pdf
Open Access
871.22 kBAdobe PDFView/Open
Show full item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue