Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Artificial Intelligence in Bacterial Metabolite Discovery and Biosynthesis

dc.contributor.advisorMagarvey, Nathan
dc.contributor.authorGunabalasingam, Mathusan
dc.contributor.departmentBiochemistryen_US
dc.date.accessioned2025-07-22T18:59:33Z
dc.date.available2025-07-22T18:59:33Z
dc.date.issued2025-11
dc.description.abstractArtificial intelligence (AI) is revolutionizing various biological domains, from protein folding to drug discovery, transforming data analysis and interpretation. However, the field of natural products, particularly those derived from bacteria, has been slow to embrace these advances, hindering the discovery of novel chemistries with significant implications for medicine, agriculture, and biomaterials. This delay is not due to a lack of training data but the fragmented and siloed nature of the data itself, limiting the ability to generate actionable insights. The challenge lies in the absence of a cohesive framework to effectively integrate diverse data types. This thesis proposes knowledge graphs as a solution to this challenge, providing a robust, scalable framework for integrating genomic, molecular, and spectral data. By leveraging Graphormers, a cutting-edge AI architecture, we build the first comprehensive knowledge graph describing bacterial metabolism. This framework connects genomic sequences, biosynthetic gene clusters (BGCs), molecular structures, and mass spectrometry profiles, streamlining targeted metabolite discovery. Data integration is achieved through three core technologies: IBIS, BLOOM, and MAPLE. IBIS generates enzyme embeddings for large-scale identification, annotation, and comparison of BGCs across bacterial genomes, surpassing current genome mining technologies in speed and accuracy. BLOOM maps biosynthetic units to molecular structures, predicting BGC–metabolite associations and uncovering uncharacterized biosynthetic logic. MAPLE generates spectral embeddings for high-throughput metabolite comparison, identification of taxonomically exclusive metabolic signatures, and biosynthetic pathway organization. The resulting knowledge graph enables predictive modeling of metabolite–gene associations, experimentally validated with the isolation of several novel metabolites. This work demonstrates how knowledge graphs and AI-driven integration overcome the limitations of siloed data, creating hypothesis engines capable of learning patterns across multi-omic data.en_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractBacterial metabolism could unlock the next generation of antibiotics, crop treatments, and biomaterials. Yet discovery remains slow and inefficient, often relying on random microbe selection, repetitive fractionation, and biological testing—frequently resulting in the rediscovery of known compounds. In an era of big data and artificial intelligence (AI), this process must be reimagined. This research uses AI to connect DNA sequences, chemical structures, and lab measurements into a single, searchable framework called a knowledge graph. By building holistic views of complex biological systems, we uncover hidden relationships—including new connections between genes and the molecules they produce—enabling targeted discovery of novel chemical scaffolds. This work sets the stage for automation, where robotics can isolate and test promising compounds, accelerating discovery in a scalable, high-throughput, and cost-effective manner. In doing so, we unlock the untapped chemical potential of bacteria, offering solutions to global challenges such as antibiotic resistance, food security, and environmental sustainability.en_US
dc.identifier.urihttp://hdl.handle.net/11375/32021
dc.language.isoenen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectKnowledge Graphsen_US
dc.subjectMetabolismen_US
dc.subjectBiosynthesisen_US
dc.subjectGenomicsen_US
dc.subjectChemoinformaticsen_US
dc.subjectMetabolomicen_US
dc.subjectTransformeren_US
dc.subjectGraphormeren_US
dc.subjectMass Spectrometryen_US
dc.subjectNatural Productsen_US
dc.subjectBacteriaen_US
dc.titleArtificial Intelligence in Bacterial Metabolite Discovery and Biosynthesisen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 5 of 6
Loading...
Thumbnail Image
Name:
Gunabalasingam_Mathusan_2025_July_PhD.pdf
Size:
23.22 MB
Format:
Adobe Portable Document Format
Description:
This document contains the main body of the thesis.
Loading...
Thumbnail Image
Name:
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Appendix-External.pdf
Size:
9.96 MB
Format:
Adobe Portable Document Format
Description:
This document contains supplementary figures relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size limitations.
Loading...
Thumbnail Image
Name:
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Datasets.xlsx
Size:
4.14 MB
Format:
Microsoft Excel XML
Description:
This document contains large datasets relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size constraints.
Loading...
Thumbnail Image
Name:
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-3-Appendix-External.pdf
Size:
18.44 MB
Format:
Adobe Portable Document Format
Description:
This document contains supplementary figures relevant to Chapter 3 of the thesis, which could not be included in the main body due to file size limitations.
Loading...
Thumbnail Image
Name:
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Appendix-External.pdf
Size:
14.07 MB
Format:
Adobe Portable Document Format
Description:
This document contains supplementary figures relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size limitations.

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: