DOMAIN-SPECIFIC ADAPTATION AND MULTI-HOP REASONING IN CHEMISTRY AND BIOMEDICINE

Khodadad, Mohammad

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/32256

Title:	DOMAIN-SPECIFIC ADAPTATION AND MULTI-HOP REASONING IN CHEMISTRY AND BIOMEDICINE
Authors:	Khodadad, Mohammad
Advisor:	Mahyar, Hamidreza
Department:	Computational Engineering and Science
Keywords:	Large Language Models;Chemistry;Biomedicine;Medicine
Publication Date:	2025
Abstract:	Large language models (LLMs) and embedding techniques have transformed general-purpose NLP, but their performance degrades on specialized scientific texts. In this thesis, we make three contributions to bridge this gap. First, we introduce two large-scale benchmark suites: ChemTEB, comprising 35 tasks on chemical corpora drawn from PubChem, CoconutDB, Safety Data Sheets, and Wikipedia; and MedTEB, comprising 51 medical tasks spanning EHR notes, PubMed abstracts, and clinical question–answer sets. Both cover classification, clustering, pair classification, retrieval, and bitext mining. Second, we propose MedTE, a 768-dimensional embedding model fine-tuned via self-supervised contrastive learning on an extensive biomedical corpus, which achieves state-of-the-art performance on MedTEB. Third, we develop GraphRAG, an automated pipeline that constructs chemical knowledge graphs from ChemRxiv preprints and generates multi-hop questions to assess compositional reasoning. Through rigorous evaluation, we show that ChemTEB reveals critical weaknesses in current chemical embeddings and that even with perfect context, LLMs achieve under 50\% accuracy on multi-hop chemistry question answering. We release all benchmarks, code, and models to foster further research in domain adaptation and compositional reasoning for specialized NLP applications.
URI:	http://hdl.handle.net/11375/32256
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Mohammad_s_thesis (9).pdf Open Access		5.99 MB	Adobe PDF	View/Open

Show full item record