Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/11352
Title: | STATISTICAL METHODS FOR VARIABLE SELECTION IN THE CONTEXT OF HIGH-DIMENSIONAL DATA: LASSO AND EXTENSIONS |
Authors: | Yang, Xiao Di |
Advisor: | Beyene, Joseph Narayanaswamy Balakrishnan, Aaron Childs Narayanaswamy Balakrishnan, Aaron Childs |
Department: | Mathematics and Statistics |
Keywords: | Lasso;High-Dimensional;Penalized Variable Selection Methods;Applied Statistics;Biostatistics;Statistical Models;Applied Statistics |
Publication Date: | Oct-2011 |
Abstract: | <p>With the advance of technology, the collection and storage of data has become routine. Huge amount of data are increasingly produced from biological experiments. the advent of DNA microarray technologies has enabled scientists to measure expressions of tens of thousands of genes simultaneously. Single nucleotide polymorphism (SNP) are being used in genetic association with a wide range of phenotypes, for example, complex diseases. These high-dimensional problems are becoming more and more common. The "large p, small n" problem, in which there are more variables than samples, currently a challenge that many statisticians face. The penalized variable selection method is an effective method to deal with "large p, small n" problem. In particular, The Lasso (least absolute selection and shrinkage operator) proposed by Tibshirani has become an effective method to deal with this type of problem. the Lasso works well for the covariates which can be treated individually. When the covariates are grouped, it does not work well. Elastic net, group lasso, group MCP and group bridge are extensions of the Lasso. Group lasso enforces sparsity at the group level, rather than at the level of the individual covariates. Group bridge, group MCP produces sparse solutions both at the group level and at the level of the individual covariates within a group. Our simulation study shows that the group lasso forces complete grouping, group MCP encourages grouping to a rather slight extent, and group bridge is somewhere in between. If one expects that the proportion of nonzero group members to be greater than one-half, group lasso maybe a good choice; otherwise group MCP would be preferred. If one expects this proportion to be close to one-half, one may wish to use group bridge. A real data analysis example is also conducted for genetic variation (SNPs) data to find out the associations between SNPs and West Nile disease.</p> |
URI: | http://hdl.handle.net/11375/11352 |
Identifier: | opendissertations/6325 7377 2262620 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.pdf | 688.93 kB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.