Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/32239
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorMagarvey, Nathan A.-
dc.contributor.authorSpencer, Norman R.-
dc.date.accessioned2025-08-26T17:44:51Z-
dc.date.available2025-08-26T17:44:51Z-
dc.date.issued2025-
dc.identifier.urihttp://hdl.handle.net/11375/32239-
dc.description.abstractBacterial specialized metabolite (SM) scaffolds are fundamental to many important medicines, including antibiotics. Widespread dissemination of antimicrobial resistance demands the isolation of mechanistically and structurally novel therapeutics to enable lifesaving medical interventions. The meteoric growth of genomic sequencing data has uncovered millions of biosynthetic gene clusters (BGCs) encoding SMs. However, much of this chemical space remains unexplored due to technical limitations in BGC comparison and limited strategies for BGC prioritization. In this thesis, I develop deep learning algorithms which enable high-throughput comparison, structural rationalization, bioactivity prediction, and defragmentation of BGCs to enable large-scale BGC prioritization for SM-based drug discovery efforts. Firstly, I develop Transformer-based deep learning algorithms to identify and represent BGCs using highly scalable, vectorized representations. These algorithms drastically outperform the current state of the art and enable rapid comparison, grouping, and prioritization of BGCs at an immense (>1 million BGC) scale. Secondly, I develop computational methods to biosynthetically link SMs to candidate BGCs, increasing the dataset of potential SM-BGC relationships eight-fold relative to current datasets. This method also enables prioritization of BGCs encoding structural novelty and streamlines the isolation of SMs in a rationalizable fashion, leading to the isolation of a novel lipopeptide. Thirdly, I develop computational methods to identify bioactive molecular and genetic signatures present in BGCs and use these methods to streamline the isolation of a novel antitubercular peptide. Finally, I demonstrate a method enabling BGC defragmentation with scalable BGC fragment representations, facilitating the identification and comparison of discontiguous BGCs. Critically, the advances in this thesis leverage highly scalable vectorized representations which are capable of managing the extreme dataset sizes being created in the era of “multi-omics” data. Together, this work provides a means to leverage the immense wealth of genomic data to prioritize novel BGCs for streamlined, targeted SM-based drug discovery.en_US
dc.language.isoenen_US
dc.subjectNatural Productsen_US
dc.subjectGenomicsen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectTransformeren_US
dc.subjectGraphormeren_US
dc.subjectSpecialized Metabolismen_US
dc.subjectMetabolismen_US
dc.subjectKnowledge Graphsen_US
dc.subjectBiosynthesisen_US
dc.subjectBacteriaen_US
dc.titleDeep Learning Augmented Genome Mining in the "omics" Eraen_US
dc.title.alternativeDEEP LEARNING AUGMENTED GENOME MINING IN THE “OMICS” ERAen_US
dc.typeThesisen_US
dc.contributor.departmentBiochemistry and Biomedical Sciencesen_US
dc.description.degreetypeThesisen_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
dc.description.layabstractBacteria sometimes produce unique “specialized metabolites” (SMs) with potent bioactivities (e.g. antibacterial, antifungal) that humans have developed into some of the most important medicines, antibiotics, and pesticides in use today. The instructions governing SM construction are found in genome regions known as “biosynthetic gene clusters” (BGCs). Widespread application of SM therapeutics has produced pests and pathogens that are resistant to these drugs, so new SMs are desperately needed. However, it is increasingly difficult to find new SMs without targeted approaches. In this work, I use deep learning approaches to identify and compare BGCs, producing significant improvements in accuracy, speed, and scalability relative to current methods. In addition, I introduce tools which predict SM-structure components and bioactivity from SM data, enabling the identification and prioritization of new, bioactive SMs for isolation. Using this approach, we isolate two novel SMs, one of which has potent antitubercular activity.en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Spencer_Norman_R_202507_PhD.pdf
Embargoed until: 2026-08-06
12.08 MBAdobe PDFView/Open
Appendix A.pdf
Embargoed until: 2026-08-06
10.03 MBAdobe PDFView/Open
File_A1.txt
Embargoed until: 2026-08-06
111.84 kBTextView/Open
File_A2.txt
Embargoed until: 2026-08-06
4.12 kBTextView/Open
File_A3.txt
Embargoed until: 2026-08-06
26.23 kBTextView/Open
Table_A1.xlsx
Embargoed until: 2026-08-06
286.94 kBMicrosoft Excel XMLView/Open
Table_A2.xlsx
Embargoed until: 2026-08-06
143.75 kBMicrosoft Excel XMLView/Open
Table_A3.xlsx
Embargoed until: 2026-08-06
255.44 kBMicrosoft Excel XMLView/Open
Table_A4.xlsx
Embargoed until: 2026-08-06
2.62 MBMicrosoft Excel XMLView/Open
Table_A5.xlsx
Embargoed until: 2026-08-06
21.54 kBMicrosoft Excel XMLView/Open
Table_A6.xlsx
Embargoed until: 2026-08-06
3.28 MBMicrosoft Excel XMLView/Open
Table_A7.xlsx
Embargoed until: 2026-08-06
353.33 kBMicrosoft Excel XMLView/Open
Appendix B.pdf
Embargoed until: 2026-08-06
18.95 MBAdobe PDFView/Open
Appendix C.pdf
Embargoed until: 2026-08-06
12.89 MBAdobe PDFView/Open
TableC1.xlsx
Embargoed until: 2026-08-06
52.94 kBMicrosoft Excel XMLView/Open
TableC2.xslx.xlsx
Embargoed until: 2026-08-06
12.07 kBMicrosoft Excel XMLView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue