Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/12246
Title: | Sparse Principal Component Analysis for High-Dimensional Data: A Comparative Study |
Authors: | Bonner, Ashley J. |
Advisor: | Beyene, Joseph Jemila Hamid, Roman Viveros-Aguilera |
Department: | Mathematics and Statistics |
Keywords: | Principal Component Analysis (PCA);High Dimensional Data;Sparse Principal Component Analysis (Sparse PCA);Simulations;Loading Vectors;Tuning Parameters;Applied Statistics;Biostatistics;Multivariate Analysis;Statistical Methodology;Applied Statistics |
Publication Date: | Oct-2012 |
Abstract: | <p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p> |
URI: | http://hdl.handle.net/11375/12246 |
Identifier: | opendissertations/7146 8155 3022127 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.pdf | 3.71 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.