Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/32021
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorMagarvey, Nathan-
dc.contributor.authorGunabalasingam, Mathusan-
dc.date.accessioned2025-07-22T18:59:33Z-
dc.date.available2025-07-22T18:59:33Z-
dc.date.issued2025-11-
dc.identifier.urihttp://hdl.handle.net/11375/32021-
dc.description.abstractArtificial intelligence (AI) is revolutionizing various biological domains, from protein folding to drug discovery, transforming data analysis and interpretation. However, the field of natural products, particularly those derived from bacteria, has been slow to embrace these advances, hindering the discovery of novel chemistries with significant implications for medicine, agriculture, and biomaterials. This delay is not due to a lack of training data but the fragmented and siloed nature of the data itself, limiting the ability to generate actionable insights. The challenge lies in the absence of a cohesive framework to effectively integrate diverse data types. This thesis proposes knowledge graphs as a solution to this challenge, providing a robust, scalable framework for integrating genomic, molecular, and spectral data. By leveraging Graphormers, a cutting-edge AI architecture, we build the first comprehensive knowledge graph describing bacterial metabolism. This framework connects genomic sequences, biosynthetic gene clusters (BGCs), molecular structures, and mass spectrometry profiles, streamlining targeted metabolite discovery. Data integration is achieved through three core technologies: IBIS, BLOOM, and MAPLE. IBIS generates enzyme embeddings for large-scale identification, annotation, and comparison of BGCs across bacterial genomes, surpassing current genome mining technologies in speed and accuracy. BLOOM maps biosynthetic units to molecular structures, predicting BGC–metabolite associations and uncovering uncharacterized biosynthetic logic. MAPLE generates spectral embeddings for high-throughput metabolite comparison, identification of taxonomically exclusive metabolic signatures, and biosynthetic pathway organization. The resulting knowledge graph enables predictive modeling of metabolite–gene associations, experimentally validated with the isolation of several novel metabolites. This work demonstrates how knowledge graphs and AI-driven integration overcome the limitations of siloed data, creating hypothesis engines capable of learning patterns across multi-omic data.en_US
dc.language.isoenen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectKnowledge Graphsen_US
dc.subjectMetabolismen_US
dc.subjectBiosynthesisen_US
dc.subjectGenomicsen_US
dc.subjectChemoinformaticsen_US
dc.subjectMetabolomicen_US
dc.subjectTransformeren_US
dc.subjectGraphormeren_US
dc.subjectMass Spectrometryen_US
dc.subjectNatural Productsen_US
dc.subjectBacteriaen_US
dc.titleArtificial Intelligence in Bacterial Metabolite Discovery and Biosynthesisen_US
dc.typeThesisen_US
dc.contributor.departmentBiochemistryen_US
dc.description.degreetypeThesisen_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
dc.description.layabstractBacterial metabolism could unlock the next generation of antibiotics, crop treatments, and biomaterials. Yet discovery remains slow and inefficient, often relying on random microbe selection, repetitive fractionation, and biological testing—frequently resulting in the rediscovery of known compounds. In an era of big data and artificial intelligence (AI), this process must be reimagined. This research uses AI to connect DNA sequences, chemical structures, and lab measurements into a single, searchable framework called a knowledge graph. By building holistic views of complex biological systems, we uncover hidden relationships—including new connections between genes and the molecules they produce—enabling targeted discovery of novel chemical scaffolds. This work sets the stage for automation, where robotics can isolate and test promising compounds, accelerating discovery in a scalable, high-throughput, and cost-effective manner. In doing so, we unlock the untapped chemical potential of bacteria, offering solutions to global challenges such as antibiotic resistance, food security, and environmental sustainability.en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Gunabalasingam_Mathusan_2025_July_PhD.pdf
Embargoed until: 2026-07-01
This document contains the main body of the thesis.23.77 MBAdobe PDFView/Open
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Appendix-External.pdf
Embargoed until: 2026-07-01
This document contains supplementary figures relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size limitations.10.2 MBAdobe PDFView/Open
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Datasets.xlsx
Embargoed until: 2026-07-01
This document contains large datasets relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size constraints.4.24 MBMicrosoft Excel XMLView/Open
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-3-Appendix-External.pdf
Embargoed until: 2026-07-01
This document contains supplementary figures relevant to Chapter 3 of the thesis, which could not be included in the main body due to file size limitations.18.89 MBAdobe PDFView/Open
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Appendix-External.pdf
Embargoed until: 2026-07-01
This document contains supplementary figures relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size limitations.14.41 MBAdobe PDFView/Open
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Datasets.xlsx
Embargoed until: 2026-07-01
This document contains large datasets relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size constraints.190.6 kBMicrosoft Excel XMLView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue