Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/32021
Title: | Artificial Intelligence in Bacterial Metabolite Discovery and Biosynthesis |
Authors: | Gunabalasingam, Mathusan |
Advisor: | Magarvey, Nathan |
Department: | Biochemistry |
Keywords: | Artificial Intelligence;Knowledge Graphs;Metabolism;Biosynthesis;Genomics;Chemoinformatics;Metabolomic;Transformer;Graphormer;Mass Spectrometry;Natural Products;Bacteria |
Publication Date: | Nov-2025 |
Abstract: | Artificial intelligence (AI) is revolutionizing various biological domains, from protein folding to drug discovery, transforming data analysis and interpretation. However, the field of natural products, particularly those derived from bacteria, has been slow to embrace these advances, hindering the discovery of novel chemistries with significant implications for medicine, agriculture, and biomaterials. This delay is not due to a lack of training data but the fragmented and siloed nature of the data itself, limiting the ability to generate actionable insights. The challenge lies in the absence of a cohesive framework to effectively integrate diverse data types. This thesis proposes knowledge graphs as a solution to this challenge, providing a robust, scalable framework for integrating genomic, molecular, and spectral data. By leveraging Graphormers, a cutting-edge AI architecture, we build the first comprehensive knowledge graph describing bacterial metabolism. This framework connects genomic sequences, biosynthetic gene clusters (BGCs), molecular structures, and mass spectrometry profiles, streamlining targeted metabolite discovery. Data integration is achieved through three core technologies: IBIS, BLOOM, and MAPLE. IBIS generates enzyme embeddings for large-scale identification, annotation, and comparison of BGCs across bacterial genomes, surpassing current genome mining technologies in speed and accuracy. BLOOM maps biosynthetic units to molecular structures, predicting BGC–metabolite associations and uncovering uncharacterized biosynthetic logic. MAPLE generates spectral embeddings for high-throughput metabolite comparison, identification of taxonomically exclusive metabolic signatures, and biosynthetic pathway organization. The resulting knowledge graph enables predictive modeling of metabolite–gene associations, experimentally validated with the isolation of several novel metabolites. This work demonstrates how knowledge graphs and AI-driven integration overcome the limitations of siloed data, creating hypothesis engines capable of learning patterns across multi-omic data. |
URI: | http://hdl.handle.net/11375/32021 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Gunabalasingam_Mathusan_2025_July_PhD.pdf | This document contains the main body of the thesis. | 23.77 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Appendix-External.pdf | This document contains supplementary figures relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size limitations. | 10.2 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Datasets.xlsx | This document contains large datasets relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size constraints. | 4.24 MB | Microsoft Excel XML | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-3-Appendix-External.pdf | This document contains supplementary figures relevant to Chapter 3 of the thesis, which could not be included in the main body due to file size limitations. | 18.89 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Appendix-External.pdf | This document contains supplementary figures relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size limitations. | 14.41 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Datasets.xlsx | This document contains large datasets relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size constraints. | 190.6 kB | Microsoft Excel XML | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.