Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/32532
Title: Enabling Human–Autonomy Teaming in Medical Imaging Through Transformers
Authors: Zhu, Calvin
Advisor: Noseworthy, Michael
Doyle, Thomas
Department: Biomedical Engineering
Keywords: Deep Learning;Human Autonomy Teaming;Transformers;Convolutional Neural Networks;Machine Learning
Publication Date: 2025
Abstract: Significant progress in deep learning has positioned highly capable, fully automated medical imaging tools on the near horizon. However, translation from research into clinical practice is slower than the rapid technological advancements. There is an alternative that is more feasible and leverages strengths from both a human operator and machine learning based tools where a decision support system is in charge of augmenting human ability. Termed human autonomy teaming (HAT), the goal is to have teams of human operators with autonomous machines work together on required tasks. The adoption of HATs has unfortunately also been slow. Interestingly, transformer-based deep learning architectures bring unique properties, the attention mechanism and generative pre-training, that may have applications across different medical imaging related tasks and further investigation may provide insights for better adoption of HATs. Conceptually, the attention mechanism informs the model to pay attention to certain portions of the input. Using this as a means to examine model decisions, the attention heads of a Vision Transformer and the important pixels for a ResNet50 model were compared to radiologist annotations to determine appropriateness of each model’s attention mechanisms in classifying two datasets. Both the ResNet50 model and the ViT model had high levels of agreement with the radiologist annotations at 88.07% and 94.85% and 94.72% and 96.96% respectively; however, the vision transformers had better performance in both test accuracy and agreement. Generative pre-training is a method for training model weights without extra labelling. During training, samples have portions masked before being given to an autoencoder which tries to recover the original input. This training method has applications in areas of medical imaging such as reconstruction and classification on imbalanced data. Adapting generative pre-training methods for reconstructing undersampled k-space data with a 25% masking produced the most faithful reconstructions with an average MSE of 6.763 E-4 and an SSIM of 0.917. Interestingly, our results are more aligned with the optimal masking ratio of language models which have an ideal masking ratio between 15-25% rather than the 75% masking ratio demonstrated in an imaging application. This reinforces the idea that further study is required to properly implement broader deep learning advances from computer vision in medical imaging. On the other hand, generative pre-training through masked autoencoders (MAEs) was investigated as a method of training on imbalanced data, achieving high single neuroimaging modality classification performance of 95.24%. Just as convolutional neural networks (CNNs) revolutionized the discipline, transformers have likewise established themselves as a defining breakthrough. Bringing many interesting avenues for exploration with them, transformers have pushed the state-of-theart and deep learning models now see use all around the world. Beyond the performance increases that transformers brought about, there is another lesson to be learned. Whatever the next ground-breaking architecture may be, researchers should consider delving into all aspects and applications rather than blindly hoping for performance improvements and fully automated tools.
URI: http://hdl.handle.net/11375/32532
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
Zhu_Calvin_finalsubmission2025Oct_PhD.pdf
Embargoed until: 2026-10-10
3.2 MBAdobe PDFView/Open
Show full item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue