A QUASI-LIKELIHOOD METHOD TO DETECT DIFFERENTIALLY EXPRESSED GENES IN RNA-SEQUENCE DATA

Gu, Chu-Shu

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/20257

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Canty, Angelo	-
dc.contributor.author	Gu, Chu-Shu	-
dc.date.accessioned	2016-08-30T13:46:01Z	-
dc.date.available	2016-08-30T13:46:01Z	-
dc.date.issued	2016	-
dc.identifier.uri	http://hdl.handle.net/11375/20257	-
dc.description.abstract	In recent years, the RNA-sequencing (RNA-seq) method, which measures the transcriptome by counting short sequencing reads obtained by high-throughput sequencing, is replacing the microarray technology as the major platform in gene expression studies. The large amount of discrete data in RNA-seq experiments calls for effective analysis methods. In this dissertation, a new method to detect differentially expressed genes based on quasi-likelihood theory is developed in experiments with a completely randomized design with two experimental conditions. The proposed method estimates the variance function empirically and consequently it has similar sensitivities and FDRs across distributions with different variance functions. In a simulation study, the method is shown to have similar sensitivities and FDRs across the data with three different types of variance functions compared with some other popular methods. This method is applied to a real dataset with two experimental conditions along with some competing methods. The new method is then extended to more complex designs such as an experiment with multiple experimental conditions, an experiment with block design and an experiment with factorial design. The same advantages for the new method have been found in simulation studies. This method and some competing methods are applied to three real datasets with complex designs. The new method is also applied to analyze reads per kilobase per million mapped reads (RPKM) data. In the simulation, the method is compared with the Linear Models for Microarray Data (LIMMA) originally developed for microarray analysis (Smyth, 2004) and the question of normalization is also examined. It is shown that the new method and the LIMMA method have similar performance. Further normalization is required for the proper analysis of the RPKM data and the best such normalization is the scaling method. Analyzing raw count data properly has better performance than analyzing the RPKM data. Different normalization and statistical methods are applied to a real dataset with varied gene length across samples.	en_US
dc.language.iso	en	en_US
dc.subject	RNA-seq	en_US
dc.subject	Quasi-likelihood	en_US
dc.subject	RPKM	en_US
dc.title	A QUASI-LIKELIHOOD METHOD TO DETECT DIFFERENTIALLY EXPRESSED GENES IN RNA-SEQUENCE DATA	en_US
dc.type	Thesis	en_US
dc.contributor.department	Clinical Epidemiology/Clinical Epidemiology & Biostatistics	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Doctor of Philosophy (PhD)	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Gu_Chu-Shu_finalsubmission201608_PhD.pdf Open Access	Main article	913.15 kB	Adobe PDF	View/Open

Show simple item record