Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/27555
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSmyth, William F.-
dc.contributor.advisorMhaskar, Neerja-
dc.contributor.authorKoponen, Holly-
dc.date.accessioned2022-05-12T20:20:13Z-
dc.date.available2022-05-12T20:20:13Z-
dc.date.issued2022-
dc.identifier.urihttp://hdl.handle.net/11375/27555-
dc.description.abstractThis thesis describes the development and application of the new software MAXCOVER that computes maximal covers and non-extendible repeats (a.k.a. “maximal repeats”). A string is a finite array x[1..n] of elements chosen from a set of totally ordered symbols called an alphabet. A repeat is a substring that occurs at least twice in x. A repeat is left/right extendible if every occurrence is preceded/followed by the same symbol; otherwise, it is non-left/non-right extendible (NLE/NRE). A non-extendible (NE) repeat is both NLE and NRE. A repeat covers a position i if x[i] lies within the repeat. A maximal cover (a.k.a. “optimal cover”) is a repeat that covers the most positions in x. For simplicity, we first describe a quadratic O(n2) implementation of MAXCOVER to compute all maximal covers of a given string based on the pseudocode given in [1]. Then, we consider the logarithmic O(n log n) pseudocode in [1], in which we identify several errors. We leave a complete correction and implementation for future work. Instead, we propose two improved quadratic algorithms that, shown through experiments, will execute in linear time for the average case. We perform a benchmark evaluation of MAXCOVER’s performance and demonstrate its value to biologists in the protein context [2]. To do so, we develop an extension of MAXCOVER for the closely related task of computing NE repeats. Then, we compare MAXCOVER to the repeat-match feature of the well-known MUMmer software [3] (600+ citations). We determine that MAXCOVER is an order-of-magnitude faster than MUMmer with much lower space requirements. We also show that MAXCOVER produces a more compact, exact, and user-friendly output that specifies the repeats. Availability: Open source code, binaries, and test data are available on Github at https://github.com/hollykoponen/MAXCOVER. Currently runs on Linux, untested on other OS.en_US
dc.language.isoenen_US
dc.subjectStringen_US
dc.subjectMaximal Coveren_US
dc.subjectRepeatsen_US
dc.subjectMAXCOVERen_US
dc.subjectprotein sequenceen_US
dc.subjectpattern matchingen_US
dc.subjectNE repeaten_US
dc.subjectMUMmeren_US
dc.titleEfficient Implementation & Application of Maximal String Covering Algorithmen_US
dc.title.alternativeMAXIMAL COVER ALGORITHM IMPLEMENTATIONen_US
dc.typeThesisen_US
dc.contributor.departmentComputing and Softwareen_US
dc.description.degreetypeThesisen_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.layabstractThis thesis deals with a simple yet essential data structure called a string, a sequence of symbols drawn from an alphabet. For example, a DNA sequence is a string comprised of four letters. We describe a new software called MAXCOVER that identifies maximal covers of a given string x (a repeating substring that ‘covers’ the most positions in x). This software is based on the algorithms in [1]. We propose two new algorithms that perform faster in practice. We also extended MAXCOVER for the closely related task of computing non-extendible repeats. We compare this extension to the well-known MUMmer software (600+ citations). We find that MAXCOVER is many times faster than MUMmer with much lower space requirements and produces a more compact, exact and user-friendly output.en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Koponen_Holly_K_2022April_MSc.pdf
Open Access
871.22 kBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue