The Growth Curve Model for High Dimensional Data and its Application in Genomics

Jana, Sayantee

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/12780

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Hamid, Jemila S	en_US
dc.contributor.advisor	N. Balakrishnan, R. Viveros	en_US
dc.contributor.author	Jana, Sayantee	en_US
dc.date.accessioned	2014-06-18T17:00:43Z	-
dc.date.available	2014-06-18T17:00:43Z	-
dc.date.created	2012-12-20	en_US
dc.date.issued	2013-04	en_US
dc.identifier.other	opendissertations/7638	en_US
dc.identifier.other	8699	en_US
dc.identifier.other	3552261	en_US
dc.identifier.uri	http://hdl.handle.net/11375/12780	-
dc.description.abstract	<p>Recent advances in technology have allowed researchers to collect high-dimensional biological data simultaneously. In genomic studies, for instance, measurements from tens of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in massive amount of data. In such experiments, researchers are faced with two types of high-dimensionality. The first is global high-dimensionality, which is common to all genomic experiments. The global high-dimensionality arises because inference is being done on tens of thousands of genes resulting in multiplicity. This challenge is often dealt with statistical methods for multiple comparison, such as the Bonferroni correction or false discovery rate (FDR). We refer to the second type of high-dimensionality as gene specific high-dimensionality, which arises in time course microarry experiments due to the fact that, in such experiments, sample size is often smaller than the number of time points ($n</p> <p>In this thesis, we use the growth curve model (GCM), which is a generalized multivariate analysis of variance (GMANOVA) model, and propose a moderated test statistic for testing a special case of the general linear hypothesis, which is specially useful for identifying genes that are expressed. We use the trace test for the GCM and modify it so that it can be used in high-dimensional situations. We consider two types of moderation: the Moore-Penrose generalized inverse and Stein's shrinkage estimator of $ S $. We performed extensive simulations to show performance of the moderated test, and compared the results with original trace test. We calculated empirical level and power of the test under many scenarios. Although the focus is on hypothesis testing, we also provided moderated maximum likelihood estimator for the parameter matrix and assessed its performance by investigating bias and mean squared error of the estimator and compared the results with those of the maximum likelihood estimators. Since the parameters are matrices, we consider distance measures in both power and level comparisons as well as when investigating bias and mean squared error. We also illustrated our approach using time course microarray data taken from a study on Lung Cancer. We were able to filter out 1053 genes as non-noise genes from a pool of 22,277 genes which is approximately 5\% of the total number of genes. This is in sync with results from most biological experiments where around 5\% genes are found to be differentially expressed.</p>	en_US
dc.subject	Generalized multivariate analysis of variance	en_US
dc.subject	growth curve model	en_US
dc.subject	high-dimensional data	en_US
dc.subject	Euclidean distance	en_US
dc.subject	multivariate bias and mean square error	en_US
dc.subject	moderated trace test	en_US
dc.subject	Moore-Penrose generalized inverse	en_US
dc.subject	Biostatistics	en_US
dc.subject	Multivariate Analysis	en_US
dc.subject	Biostatistics	en_US
dc.title	The Growth Curve Model for High Dimensional Data and its Application in Genomics	en_US
dc.type	thesis	en_US
dc.contributor.department	Mathematics and Statistics	en_US
dc.description.degree	Master of Science (MSc)	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Size	Format
fulltext.pdf Open Access	938.43 kB	Adobe PDF	View/Open

Show simple item record