Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/16736
Title: | A phylogenetic model to predict the patterns of presence and absence of genes in bacterial genomes and estimate the frequency of horizontal gene transfer |
Authors: | Zamani Dahaj, Seyed Alireza |
Advisor: | Higgs, Paul |
Department: | Physics and Astronomy |
Publication Date: | Jun-2015 |
Abstract: | For a group of bacterial genomes, the core genome is the set of genes present in all the individual genomes and the pangenome is the set of genes present in at least one of the genomes. Typically, a relatively small fraction of genes is in the core, and many other genes are only found in only one or a small number of genomes. This indicates that there is a wide range of time scales of genome evolution, with rapid insertion and deletion of some genes and long-term retention of others. Here, we study the full set of the genes in a group of 40 complete genomes of Cyanobacteria. Genes are clustered using sequence similarity measures, and for each cluster we obtain the pattern of presence and absence of the genes across the 40 species. We use evolutionary models of gene insertion and deletion to calculate the likelihood of each of the observed patterns. One important case we consider is the infinitely many genes model (IMG) in which each gene can only originate once but can be deleted multiple times. In contrast, the finitely many genes model (FMG) allows more than one insertion of the same type of gene in different genomes, which would be the case if there were horizontal gene transfer (HGT). The maximum likelihood model allows us to predict which genes have a presence-absence pattern that is best explained by horizontal transfer. We find that about 15% of the genes experienced HGT in their history of evolution. It is found that there is a broad range of rates of insertion and deletion of genes, which explains why there are a large number of genes that follow a typical treelike pattern of vertical inheritance, despite the presence of a significant minority of genes that undergo HGT. We also estimate the IV ancestral genome size of Cyanobacteria. It is found that that the inferred frequency of HGT and the size of the ancestral genome both depend on the ratio of insertion to deletion rates of genes. However, the variation in the estimated ancestral genome size is much less than in previous treatments that used parsimony. As the phylogenetic tree of Cyanobacteria is not completely specified, we test our models on ten different trial species trees that differ by small rearrangements of species. It is found that the estimated frequency of HGT and the maximum likelihood values of the insertion and deletion rate parameters are not very sensitive to small changes in the tree. However, the likelihood of the gene presence/absence patterns on different trees differs significantly among the trees. Therefore, these patterns can be used for phylogenetic inference. This kind of phylogenetic inference makes use of all the genes present on the genomes. In contrast, phylogenetic methods using protein sequence evolution only make use of the relatively small number of genes that are present in all of the genomes in the set. We compare the likelihood ranking of trial trees using the presence/absence data with the ranking of the same trees using protein sequence evolution with conserved common genes (present in all cyanobacteria and a large proportion of other genomes) and signature genes (present in all cyanobacteria and no other species). |
URI: | http://hdl.handle.net/11375/16736 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
revised-thesis-zamani.pdf | Main thesis | 3.31 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.