Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

USE OF LARGE LANGUAGE AND REASONING MODELS IN THE GENERATION AND EVALUATION OF EXPERT- AND PATIENT-ORIENTED SUMMARIES OF RADIOLOGY REPORT FINDINGS

dc.contributor.advisorSaha, Ashirbani
dc.contributor.authorTasneem, Nanziba
dc.contributor.departmenteHealthen_US
dc.date.accessioned2025-09-24T18:42:36Z
dc.date.available2025-09-24T18:42:36Z
dc.date.issued2025
dc.description.abstractBackground: Expert- and patient-facing clinical summaries are important for patient care journeys, but creating these are time-consuming for healthcare providers. Large Language Models (LLMs) can be used for clinical text summarization; however, comprehensive evaluations are necessary prior to implementation. The objectives of this thesis paper were to (1) evaluate five LLMs [GPT-4, GPT-4o mini, Gemini 1.5 – Pro, Gemini 1.5 – Flash, and Llama 3.1] for impression generation (expert-facing) and (2) lay summaries (patient-facing), (i) using a mixed-method evaluation framework compare laypersons and a Large Reasoning Model (LRM), and (3) assess LRM (Gemini 2.5 – Pro) evaluator reliability. Methods: 100 radiology reports were sampled from the “BioNLP 2023 report summarization” dataset. Each LLM generated impressions (Chapter 2) and lay summaries (Chapter 3 and 4) using optimized prompts and hyperparameters. Impressions were evaluated by experts, the LRM and similarity metrics; lay summaries were evaluated by experts, laypersons, the LRM and readability metrics. Performance rankings were based on agreement percentages, with statistical analyses including Friedman, post-hoc Nemenyi tests, Kruskal-Wallis and Mann–Whitney U test. Results: For impression generation, Gemini 1.5 - Pro outperformed GPT-4 in coherence, comprehensiveness, and reduced medical harmfulness, despite lower cost. LRM and human evaluations had 2.15% complete disagreement. For lay summaries, Gemini 1.5 - Flash and Pro were top-rated for actionable and readable summaries requiring minimal supervision (P < 9.03×10⁻²¹). GPT-4 had the highest expert-rated accuracy (98%) while Gemini 1.5 - Pro had the best readability score. Laypersons had the highest understanding of GPT-4o mini and Gemini 1.5 - Pro summaries. LRM-layperson agreement varied by category and model. Conclusion: Gemini 1.5 - Flash and Pro consistently ranked among the top performers for impression and lay summary generation. GPT-4o mini also showed strong patient-facing characteristics. These findings highlight LLMs’ potential to improve clinical communication and the value of LRM-based evaluation frameworks.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractCreating clear medical summaries for both doctors and patients is important, but it adds extra work for healthcare providers. This study explores how Artificial Intelligence (AI) models can help generate and evaluate summaries from radiology reports. Five models (Gemini 1.5 - Flash, Gemini 1.5 - Pro, GPT-4o mini, GPT-4 and Llama 3.1) were prompted to generate two types of summaries: expert summaries for clinicians and lay summaries for patients. These summaries were then evaluated by experts, laypersons (lay summaries), Gemini 2.5 - Pro (AI model) and quantitative metrics. The results show that Gemini 1.5 - Pro and GPT-4 generated coherent and accurate impressions, Gemini 1.5 - Flash and Gemini 1.5 - Pro produced lay summaries without inaccuracies and increased readability. Laypersons reported higher understanding and confidence with GPT-4o mini and Gemini 1.5 – Pro summaries. These findings show the potential of using AI to support and evaluate clinical text summarization.en_US
dc.identifier.urihttp://hdl.handle.net/11375/32396
dc.language.isoenen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectLarge Language Modelsen_US
dc.subjectRadiologyen_US
dc.subjectNatural Language Processingen_US
dc.subjectHealth Communicationen_US
dc.titleUSE OF LARGE LANGUAGE AND REASONING MODELS IN THE GENERATION AND EVALUATION OF EXPERT- AND PATIENT-ORIENTED SUMMARIES OF RADIOLOGY REPORT FINDINGSen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tasneem_Nanziba_finalsubmission2025September_eHealth.pdf
Size:
3.07 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: