Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/27385
Title: | Variable Selection Methods for Model-based Clustering and Application to High-dimensional Data |
Authors: | Xu, Jini |
Advisor: | McNicholas, Sharon Jeganathan, Pratheepa |
Department: | Mathematics and Statistics |
Keywords: | Clustering;Statistics |
Publication Date: | 2022 |
Abstract: | Clustering helps in understanding the natural grouping and internal structure of data. Model-based clustering considers each cluster as a component in a mixture model. As the data dimensionality and complexity increase, model-based clustering tends to over-parametrize results. Thus, it is important to select a subset of critical variables instead of using all the variables for clustering. This study considers two variable selection methods for model-based clustering on real world high-dimensional data; variable selection for clustering and classification (VSCC) and variable selection for model-based clustering (clustvarsel). For simplicity, Gaussian mixture models were applied. Three criteria are used to compare the clustering accuracy and efficiency, which are the adjusted rand index (ARI), mis-clustering error, and performance time (in seconds). |
URI: | http://hdl.handle.net/11375/27385 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Xu_Jini_finalsubmission202202_MSc.pdf | 8.45 MB | Adobe PDF | View/Open | |
Jini Xu_final_submission_sheet.pdf | Final Thesis Submission Sheet | 183.81 kB | Adobe PDF | View/Open |
Jini Xu_License to McMaster Form.pdf | McMaster University Licence | 90.45 kB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.