Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/12246
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorBeyene, Josephen_US
dc.contributor.advisorJemila Hamid, Roman Viveros-Aguileraen_US
dc.contributor.authorBonner, Ashley J.en_US
dc.date.accessioned2014-06-18T16:58:49Z-
dc.date.available2014-06-18T16:58:49Z-
dc.date.created2012-06-25en_US
dc.date.issued2012-10en_US
dc.identifier.otheropendissertations/7146en_US
dc.identifier.other8155en_US
dc.identifier.other3022127en_US
dc.identifier.urihttp://hdl.handle.net/11375/12246-
dc.description.abstract<p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p>en_US
dc.subjectPrincipal Component Analysis (PCA)en_US
dc.subjectHigh Dimensional Dataen_US
dc.subjectSparse Principal Component Analysis (Sparse PCA)en_US
dc.subjectSimulationsen_US
dc.subjectLoading Vectorsen_US
dc.subjectTuning Parametersen_US
dc.subjectApplied Statisticsen_US
dc.subjectBiostatisticsen_US
dc.subjectMultivariate Analysisen_US
dc.subjectStatistical Methodologyen_US
dc.subjectApplied Statisticsen_US
dc.titleSparse Principal Component Analysis for High-Dimensional Data: A Comparative Studyen_US
dc.typethesisen_US
dc.contributor.departmentMathematics and Statisticsen_US
dc.description.degreeMaster of Science (MSc)en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File SizeFormat 
fulltext.pdf
Open Access
3.71 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue