J Syst Evol ›› 2015, Vol. 53 ›› Issue (5): 380-390.DOI: 10.1111/jse.12160

• Reviews • Previous Articles     Next Articles

Coalescent methods for estimating species trees from phylogenomic data

Liang Liu1,2*, Shaoyuan Wu3, and Lili Yu4   

  1. 1Department of Statistics, University of Georgia, Athens, USA
    2Institute of Bioinformatics, University of Georgia, Athens, USA
    3Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
    4Department of Biostatistics, Georgia Southern University, Statesboro, USA
  • Received:2015-02-20 Published:2015-09-22

Abstract: Genome-scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent-based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent-based methods for estimating species trees from genome-scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent-based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree.

Key words: coalescent methods, incomplete lineage sorting, phylogenomic data, species tree