Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Variable Selection Methods for Model-based Clustering and Application to High-dimensional Data

Abstract

Clustering helps in understanding the natural grouping and internal structure of data. Model-based clustering considers each cluster as a component in a mixture model. As the data dimensionality and complexity increase, model-based clustering tends to over-parametrize results. Thus, it is important to select a subset of critical variables instead of using all the variables for clustering. This study considers two variable selection methods for model-based clustering on real world high-dimensional data; variable selection for clustering and classification (VSCC) and variable selection for model-based clustering (clustvarsel). For simplicity, Gaussian mixture models were applied. Three criteria are used to compare the clustering accuracy and efficiency, which are the adjusted rand index (ARI), mis-clustering error, and performance time (in seconds).

Description

Citation

Endorsement

Review

Supplemented By

Referenced By