Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Using machine learning to predict long non-coding RNAs and exploring their evolutionary patterns and prevalence in plant transcriptomes

dc.contributor.advisorWeretilnyk, Elizabeth
dc.contributor.advisorGolding, Brian
dc.contributor.authorSimopoulos, Caitlin
dc.contributor.departmentBiologyen_US
dc.date.accessioned2019-05-01T19:47:24Z
dc.date.available2019-05-01T19:47:24Z
dc.date.issued2019
dc.description.abstractLong non-protein coding RNAs (lncRNAs) represent a diverse and enigmatic classification of RNA. With roles associated with development and stress responses, these non-coding gene regulators are essential, and yet remain understudied in plants. Thus far, of just over 430 experimentally validated lncRNAs, only 13 are derived from plant systems and many of which do not meet the classic criteria of the RNA class. Without a solid definition of what makes a lncRNA, and few empirically validated transcripts, methods currently available for prediction fall short. To address this deficiency in lncRNA research, we constructed and applied a machine learning-based lncRNA prediction protocol that does not impose predefined rules, and utilises only experimentally confirmed lncRNAs in its training datasets. Through model evaluation, we found that our novel lncRNA prediction tool had an estimated accuracy of over 96%. In a study that predicted lncRNAs from transcriptomes of evolutionary diverse plant species, we determined that molecular features of lncRNAs display different phylogenetic signal patterns compared to protein-coding genes. Additionally, our analyses suggested that stress-resistant species express fewer lncRNAs than more stress sensitive species. To expand on these results, we used the prediction tool in concert with a transcriptomic study of two natural accessions of the drought tolerant species Eutrema salsugineum. Previously reported to show little physiological differences in a first drought, but differ significantly in a second, we instead demonstrated that the two ecotypes displayed vastly different transcriptomic responses, including the expression of lncRNAs, to a first and second drought treatment. In conclusion, the prediction tool can be applied to studies to further our knowledge of lncRNA evolution and as an additional tool in classic transcriptomic studies. The suggested importance of lncRNAs in drought resistance, and evidence of expression in two natural E. salsugineum accessions, merits further studies on the molecular and evolutionary mechanisms of these putatively regulatory transcripts.en_US
dc.description.degreeDoctor of Philosophy (PhD)en_US
dc.description.degreetypeThesisen_US
dc.identifier.urihttp://hdl.handle.net/11375/24319
dc.language.isoenen_US
dc.subjectlncRNAen_US
dc.subjectmachine learningen_US
dc.subjectphylogenetic signalen_US
dc.subjecttranscriptomesen_US
dc.subjectextremophileen_US
dc.subjectEutrema salsugineumen_US
dc.subjectplantsen_US
dc.titleUsing machine learning to predict long non-coding RNAs and exploring their evolutionary patterns and prevalence in plant transcriptomesen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Simopoulos_Caitlin_MA_2019April_PhD.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: