Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/32021
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Magarvey, Nathan | - |
dc.contributor.author | Gunabalasingam, Mathusan | - |
dc.date.accessioned | 2025-07-22T18:59:33Z | - |
dc.date.available | 2025-07-22T18:59:33Z | - |
dc.date.issued | 2025-11 | - |
dc.identifier.uri | http://hdl.handle.net/11375/32021 | - |
dc.description.abstract | Artificial intelligence (AI) is revolutionizing various biological domains, from protein folding to drug discovery, transforming data analysis and interpretation. However, the field of natural products, particularly those derived from bacteria, has been slow to embrace these advances, hindering the discovery of novel chemistries with significant implications for medicine, agriculture, and biomaterials. This delay is not due to a lack of training data but the fragmented and siloed nature of the data itself, limiting the ability to generate actionable insights. The challenge lies in the absence of a cohesive framework to effectively integrate diverse data types. This thesis proposes knowledge graphs as a solution to this challenge, providing a robust, scalable framework for integrating genomic, molecular, and spectral data. By leveraging Graphormers, a cutting-edge AI architecture, we build the first comprehensive knowledge graph describing bacterial metabolism. This framework connects genomic sequences, biosynthetic gene clusters (BGCs), molecular structures, and mass spectrometry profiles, streamlining targeted metabolite discovery. Data integration is achieved through three core technologies: IBIS, BLOOM, and MAPLE. IBIS generates enzyme embeddings for large-scale identification, annotation, and comparison of BGCs across bacterial genomes, surpassing current genome mining technologies in speed and accuracy. BLOOM maps biosynthetic units to molecular structures, predicting BGC–metabolite associations and uncovering uncharacterized biosynthetic logic. MAPLE generates spectral embeddings for high-throughput metabolite comparison, identification of taxonomically exclusive metabolic signatures, and biosynthetic pathway organization. The resulting knowledge graph enables predictive modeling of metabolite–gene associations, experimentally validated with the isolation of several novel metabolites. This work demonstrates how knowledge graphs and AI-driven integration overcome the limitations of siloed data, creating hypothesis engines capable of learning patterns across multi-omic data. | en_US |
dc.language.iso | en | en_US |
dc.subject | Artificial Intelligence | en_US |
dc.subject | Knowledge Graphs | en_US |
dc.subject | Metabolism | en_US |
dc.subject | Biosynthesis | en_US |
dc.subject | Genomics | en_US |
dc.subject | Chemoinformatics | en_US |
dc.subject | Metabolomic | en_US |
dc.subject | Transformer | en_US |
dc.subject | Graphormer | en_US |
dc.subject | Mass Spectrometry | en_US |
dc.subject | Natural Products | en_US |
dc.subject | Bacteria | en_US |
dc.title | Artificial Intelligence in Bacterial Metabolite Discovery and Biosynthesis | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Biochemistry | en_US |
dc.description.degreetype | Thesis | en_US |
dc.description.degree | Doctor of Philosophy (PhD) | en_US |
dc.description.layabstract | Bacterial metabolism could unlock the next generation of antibiotics, crop treatments, and biomaterials. Yet discovery remains slow and inefficient, often relying on random microbe selection, repetitive fractionation, and biological testing—frequently resulting in the rediscovery of known compounds. In an era of big data and artificial intelligence (AI), this process must be reimagined. This research uses AI to connect DNA sequences, chemical structures, and lab measurements into a single, searchable framework called a knowledge graph. By building holistic views of complex biological systems, we uncover hidden relationships—including new connections between genes and the molecules they produce—enabling targeted discovery of novel chemical scaffolds. This work sets the stage for automation, where robotics can isolate and test promising compounds, accelerating discovery in a scalable, high-throughput, and cost-effective manner. In doing so, we unlock the untapped chemical potential of bacteria, offering solutions to global challenges such as antibiotic resistance, food security, and environmental sustainability. | en_US |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Gunabalasingam_Mathusan_2025_July_PhD.pdf | This document contains the main body of the thesis. | 23.77 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Appendix-External.pdf | This document contains supplementary figures relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size limitations. | 10.2 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-2-Datasets.xlsx | This document contains large datasets relevant to Chapter 2 of the thesis, which could not be included in the main body due to file size constraints. | 4.24 MB | Microsoft Excel XML | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-3-Appendix-External.pdf | This document contains supplementary figures relevant to Chapter 3 of the thesis, which could not be included in the main body due to file size limitations. | 18.89 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Appendix-External.pdf | This document contains supplementary figures relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size limitations. | 14.41 MB | Adobe PDF | View/Open |
Gunabalasingam_Mathusan_2025_July_PhD_Chapter-4-Datasets.xlsx | This document contains large datasets relevant to Chapter 4 of the thesis, which could not be included in the main body due to file size constraints. | 190.6 kB | Microsoft Excel XML | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.