Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

GENERATIVE LARGE LANGUAGE MODELS FOR TRANSPARENT ARTIFICIAL INTELLIGENCE IN CLINICAL RESEARCH: ENHANCING INTERPRETABILITY THROUGH APPRAISAL AND EXPLANATION

dc.contributor.advisorLokker, Cynthia
dc.contributor.authorZhou, Fangwen
dc.contributor.departmenteHealthen_US
dc.date.accessioned2025-06-26T17:34:52Z
dc.date.available2025-06-26T17:34:52Z
dc.date.issued2025
dc.description.abstractBackground The rapid growth of medical literature necessitates effective, transparent automation tools for classification. Generative large language models (LLMs), including the Generative Pre-trained Transformer (GPT), have the potential to provide transparent classification and explain other black box models. Objective This sandwich thesis evaluates the performance of GPT in 1) classifying biomedical literature compared with a fine-tuned BioLinkBERT model, and 2) explaining the decision of encoder-only models with feature attributions compared to traditional eXplainable AI (XAI) frameworks like SHapley Additive exPlanations (SHAP) and integrated gradients (IG). Methods Randomly sampled, manually annotated clinical research articles from the Health Information Research Unit (HIRU) were used along with a top-performing BioLinkBERT classifier. In Chapter 2, GPT-4o and GPT-o3-mini were used either alone or with BioLinkBERT’s predictions in the prompt to classify article methodological rigour based on HIRU’s criteria. Either the title and abstract or the full text was provided to GPT. Performance was compared to the BioLinkBERT model and assessed primarily using Matthew’s correlation coefficient (MCC). In Chapter 3, GPT-4o was used to generate feature attributions for the BioLinkBERT model through masking perturbations and was compared to SHAP and IG using a modified area under the perturbation curve (AOPC) metric which gives a measure of performance. Results GPT-4o alone, using full text (MCC 0.429), achieved comparable classification performance to BioLinkBERT (MCC 0.466). Performance was worse with other models and inputs. As a perturbation explainer, GPT-4o’s (AOPC 0.029) performance was poor and significantly underperformed compared to SHAP (AOPC 0.222) and IG (AOPC 0.225). The identified important tokens by GPT did not align with the manual appraisal criteria. Conclusion GPT has potential in appraising biomedical literature, even without explicit training. GPT’s transparency through textual explanations improves interpretability. GPT’s poor performance in generating faithful feature attributions warrants future research. The inherent variability and stochasticity of GPT outputs necessitate careful prompting and reproducibility measures.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractArtificial intelligence (AI) is increasingly used in clinical research to help automate the classification and evaluation of scientific studies. However, understanding how complex AI models make decisions, known as interpretability, is important for ethical use, and it remains a major challenge. This thesis explores how generative language models, particularly GPT from OpenAI, can enhance the interpretability. Two approaches were tested: 1) using GPT to classify medical research articles by explaining its reasoning, and 2) using GPT to interpret decisions made by another advanced model by assigning a numerical importance value, called feature attribution, to each word. Results showed GPT was effective in classifying articles and explaining its own decisions, but it was not able to effectively explain other models using feature attributions. These results support the use of GPT to improve the transparency and accessibility of automated medical text classification and highlight potential future research in this field.en_US
dc.identifier.urihttp://hdl.handle.net/11375/31871
dc.language.isoenen_US
dc.subjectArtificial intelligenceen_US
dc.subjectNatural language processingen_US
dc.subjectExplainable AIen_US
dc.subjectLarge language modelsen_US
dc.subjectMachine learningen_US
dc.subjectDeep learningen_US
dc.subjectTransformersen_US
dc.subjectGPTen_US
dc.subjectEvidence-based Medicineen_US
dc.subjectKnowledge Translationen_US
dc.titleGENERATIVE LARGE LANGUAGE MODELS FOR TRANSPARENT ARTIFICIAL INTELLIGENCE IN CLINICAL RESEARCH: ENHANCING INTERPRETABILITY THROUGH APPRAISAL AND EXPLANATIONen_US
dc.title.alternativeGENERATIVE LARGE LANGUAGE MODELS FOR TRANSPARENT ARTIFICIAL INTELLIGENCE IN CLINICAL RESEARCHen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhou_Fangwen_2025Jun_MSc.pdf
Size:
1.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: