Dimensionality Reduction with Non-Gaussian Mixtures

Tang, Yang

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/21982

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	McNicholas, Paul	-
dc.contributor.author	Tang, Yang	-
dc.date.accessioned	2017-10-03T18:46:07Z	-
dc.date.available	2017-10-03T18:46:07Z	-
dc.date.issued	2017-11	-
dc.identifier.uri	http://hdl.handle.net/11375/21982	-
dc.description.abstract	Broadly speaking, cluster analysis is the organization of a data set into meaningful groups and mixture model-based clustering is recently receiving a wide interest in statistics. Historically, the Gaussian mixture model has dominated the model-based clustering literature. When model-based clustering is performed on a large number of observed variables, it is well known that Gaussian mixture models can represent an over-parameterized solution. To this end, this thesis focuses on the development of novel non-Gaussian mixture models for high-dimensional continuous and categorical data. We developed a mixture of joint generalized hyperbolic models (JGHM), which exhibits different marginal amounts of tail-weight. Moreover, it takes into account the cluster specific subspace and, therefore, limits the number of parameters to estimate. This is a novel approach, which is applicable to high, and potentially very- high, dimensional spaces and with arbitrary correlation between dimensions. Three different mixture models are developed using forms of the mixture of latent trait models to realize model-based clustering of high-dimensional binary data. A family of mixture of latent trait models with common slope parameters are developed to reduce the number of parameters to be estimated. This approach facilitates a low-dimensional visual representation of the clusters. We further developed the penalized latent trait models to facilitate ultra high dimensional binary data which performs automatic variable selection as well. For all models and families of models developed in this thesis, the algorithms used for model-fitting and parameter estimation are presented. Real and simulated data sets are used to assess the clustering ability of the models.	en_US
dc.language.iso	en	en_US
dc.subject	clustering	en_US
dc.subject	non-Gaussian	en_US
dc.subject	latent variables	en_US
dc.subject	mixture Models	en_US
dc.subject	categorical data	en_US
dc.subject	variational method	en_US
dc.title	Dimensionality Reduction with Non-Gaussian Mixtures	en_US
dc.type	Thesis	en_US
dc.contributor.department	Mathematics and Statistics	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Doctor of Philosophy (PhD)	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Tang_Yang_2017April_PhD.pdf Access is allowed from: 2018-04-27	PhDThesis	2.27 MB	Adobe PDF	View/Open

Show simple item record