Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/27517
Title: | Use of Machine Learning for Outlier Detection in Healthy Human Brain Magnetic Resonance Imaging (MRI) Diffusion Tensor (DT) Datasets |
Other Titles: | Outlier Detection in Brain MRI Diffusion Datasets |
Authors: | MacPhee, Neil |
Advisor: | Noseworthy, Michael |
Department: | Biomedical Engineering |
Keywords: | Outlier Detection;MRI;Machine learning;DTI |
Publication Date: | 2022 |
Abstract: | Machine learning (ML) and deep learning (DL) are powerful techniques that allow for analysis and classification of large MRI datasets. With the growing accessibility of high-powered computing and large data storage, there has been an explosive interest in their uses for assisting clinical analysis and interpretation. Though these methods can provide insights into the data which are not possible through human analysis alone, they require significantly large datasets for training which can difficult for anyone (researcher and clinician) to obtain on their own. The growing use of publicly available, multi-site databases helps solve this problem. Inadvertently, however, these databases can sometimes contain outliers or incorrectly labeled data as the subjects may or may not have subclinical or underlying pathology unbeknownst to them or to those who did the data collection. Due to the outlier sensitivity of ML and DL techniques, inclusion of such data can lead to poor classification rates and subsequent low specificity and sensitivity. Thus, the focus of this work was to evaluate large brain MRI datasets, specifically diffusion tensor imaging (DTI), for the presence of anomalies and to validate and compare different methods of anomaly detection. A total of 1029 male and female subjects ages 22 to 35 were downloaded from a global imaging repository and divided into 6 cohorts depending on their age and sex. Care was made to minimize variance due to hardware and hence only data from a specific vendor (General Electric Healthcare) and MRI B0 field strength (i.e. 3 Tesla) were obtained. The raw DTI data (i.e. in this case DICOM images) was first preprocessed into scalar metrics (i.e. FA, RD, AD, MD) and warped to MNI152 T1 1mm standardized space using the FMRIB software library (FSL). Subsequently data was segmented into regions of interest (ROI) using the JHU DTI-based white-matter atlas and a mean was calculated for each ROI defined by that atlas. The ROI data was standardized and a Z-score, for each ROI over all subjects, was calculated. Four different algorithms were used for anomaly detection, including Z-score outlier detection, maximum likelihood estimator (MLE) and minimum covariance determinant (MCD) based Mahalanobis distance outlier detection, one-class support vector machine (OCSVM) outlier detection, and OCSVM novelty detection trained on MCD based Mahalanobis distance data. The best outlier detector was found to be MCD based Mahalanobis distance, with the OCSVM novelty detector performing exceptionally well on the MCD based Mahalanobis distance data. From the results of this study, it is clear that these global databases contain outliers within their healthy control datasets, further reinforcing the need for the inclusion of outlier or novelty detection as part of the preprocessing pipeline for ML and DL related studies. |
URI: | http://hdl.handle.net/11375/27517 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thesis_NeilMacPhee.pdf | 1.39 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.