Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/27385| Title: | Variable Selection Methods for Model-based Clustering and Application to High-dimensional Data |
| Authors: | Xu, Jini |
| Advisor: | McNicholas, Sharon Jeganathan, Pratheepa |
| Department: | Mathematics and Statistics |
| Keywords: | Clustering;Statistics |
| Publication Date: | 2022 |
| Abstract: | Clustering helps in understanding the natural grouping and internal structure of data. Model-based clustering considers each cluster as a component in a mixture model. As the data dimensionality and complexity increase, model-based clustering tends to over-parametrize results. Thus, it is important to select a subset of critical variables instead of using all the variables for clustering. This study considers two variable selection methods for model-based clustering on real world high-dimensional data; variable selection for clustering and classification (VSCC) and variable selection for model-based clustering (clustvarsel). For simplicity, Gaussian mixture models were applied. Three criteria are used to compare the clustering accuracy and efficiency, which are the adjusted rand index (ARI), mis-clustering error, and performance time (in seconds). |
| URI: | http://hdl.handle.net/11375/27385 |
| Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Xu_Jini_finalsubmission202202_MSc.pdf | 8.45 MB | Adobe PDF | View/Open | |
| Jini Xu_final_submission_sheet.pdf | Final Thesis Submission Sheet | 183.81 kB | Adobe PDF | View/Open |
| Jini Xu_License to McMaster Form.pdf | McMaster University Licence | 90.45 kB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.
