Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/31596
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorThia, Kirubarajan-
dc.contributor.authorVasantharajan, Charangan-
dc.date.accessioned2025-05-01T10:44:39Z-
dc.date.available2025-05-01T10:44:39Z-
dc.date.issued2025-
dc.identifier.urihttp://hdl.handle.net/11375/31596-
dc.description.abstractLarge Language Models (LLMs) have achieved remarkable success in general-purpose natural language understanding and generation. However, their effectiveness diminishes in scientific and technical domains, where documents contain dense mathematical notation, complex layouts, and specialized terminology. These characteristics pose significant challenges for traditional LLM pipelines, often resulting in hallucinated outputs, misinterpretation of formulas, and failures in retrieving relevant context. This thesis introduces SciRAG, a Retrieval-Focused Fine-Tuning Strategy designed specifically for scientific documents. SciRAG combines structure-preserving document parsing, context-aware chunking, and domain-adapted fine-tuning using Low-Rank Adaptation (LoRA) to enhance an LLM's ability to understand and generate scientifically accurate content. The system incorporates a custom Retrieval-Augmented Generation (RAG) framework that supports semantic alignment of mathematical expressions and technical language across large corpora. Experimental evaluations demonstrate that SciRAG achieves strong performance in scientific question answering and mathematical reasoning. Notably, the model attains 70% accuracy on the GSM8k benchmark, alongside high retrieval and generation quality, achieving a Context Recall score of 0.85, Factual Correctness of 0.45, Faithfulness of 0.45, and Semantic Similarity of 0.94. These results underscore SciRAG’s effectiveness in bridging the gap between general-purpose LLMs and domain-specific, mathematically grounded language understanding.en_US
dc.language.isoenen_US
dc.subjectRetrieval-Augmented Generation (RAG)en_US
dc.subjectScientific Document Processingen_US
dc.subjectLarge Language Models (LLMs)en_US
dc.subjectDomain Adaptationen_US
dc.subjectScientific Text Understandingen_US
dc.subjectLaTeX Handlingen_US
dc.titleSciRAG: A Retrieval-Focused Fine-Tuning Strategy for Scientific Documentsen_US
dc.typeThesisen_US
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.description.degreetypeThesisen_US
dc.description.degreeMaster of Applied Science (MASc)en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Vasantharajan_Charangan_202504_MASc.pdf
Open Access
6.05 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue