Sparse Principal Component Analysis for High-Dimensional Data: A Comparative Study

Bonner, Ashley J.

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/12246

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Beyene, Joseph	en_US
dc.contributor.advisor	Jemila Hamid, Roman Viveros-Aguilera	en_US
dc.contributor.author	Bonner, Ashley J.	en_US
dc.date.accessioned	2014-06-18T16:58:49Z	-
dc.date.available	2014-06-18T16:58:49Z	-
dc.date.created	2012-06-25	en_US
dc.date.issued	2012-10	en_US
dc.identifier.other	opendissertations/7146	en_US
dc.identifier.other	8155	en_US
dc.identifier.other	3022127	en_US
dc.identifier.uri	http://hdl.handle.net/11375/12246	-
dc.description.abstract	<p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p>	en_US
dc.subject	Principal Component Analysis (PCA)	en_US
dc.subject	High Dimensional Data	en_US
dc.subject	Sparse Principal Component Analysis (Sparse PCA)	en_US
dc.subject	Simulations	en_US
dc.subject	Loading Vectors	en_US
dc.subject	Tuning Parameters	en_US
dc.subject	Applied Statistics	en_US
dc.subject	Biostatistics	en_US
dc.subject	Multivariate Analysis	en_US
dc.subject	Statistical Methodology	en_US
dc.subject	Applied Statistics	en_US
dc.title	Sparse Principal Component Analysis for High-Dimensional Data: A Comparative Study	en_US
dc.type	thesis	en_US
dc.contributor.department	Mathematics and Statistics	en_US
dc.description.degree	Master of Science (MSc)	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Size	Format
fulltext.pdf Open Access	3.71 MB	Adobe PDF	View/Open

Show simple item record