J Syst Evol ›› 2017, Vol. 55 ›› Issue (2): 85-109.DOI: 10.1111/jse.12233

• Review •     Next Articles

Relative benefits of amino-acid, codon, degeneracy, DNA, and purine-pyrimidine character coding for phylogenetic analyses of exons

Mark P. Simmons*   

  1. Department of Biology, Colorado State University, Fort Collins, CO 80523-1878, USA
  • Received:2016-09-30 Published:2017-03-08

Abstract: Both traditional as well as 10 more recent methods of coding characters from exons of protein-coding genes are reviewed. The more recent methods collectively blur the distinction between nucleotide and amino-acid coding and enable investigators to carefully quantify the effects of different sources of phylogenetic signal as well as their potential biases. Codon models, which explicitly model silent and replacement substitutions, are a major advance and are expected to be broadly useful for simultaneously inferring recent and ancient divergences, unlike amino-acid coding. Degeneracy coding, wherein ambiguity codes are used to eliminate silent substitutions at the individual-nucleotide level, has clear advantages over scoring amino-acid characters. Nucleotide, codon, and amino-acid models are now directly comparable with easy-to-use programs, and widely used phylogenetics programs can analyze partitioned supermatrices that incorporate all three types of model. Therefore, it should become standard practice to test among these alternative model types before conducting parametric phylogenetic analyses. An earlier study of 78 protein-coding genes from 360 green-plant plastid genomes is used as an empirical example with which to quantify the relative performance of alternative character-coding methods using five quantification measures. Codon models were selected as having the best fit to the data, yet were outperformed by nucleotide models for all five quantification measures. Third-codon positions were found to be an important source of phylogenetic signal and even outperformed analyses of first and second positions for some measures. Degeneracy coding generally performed at least as well as amino-acid coding and is an arguably more effective alternative.

Key words: character-state space, codon models, composite characters, phylogenetic signal, phylogenomics, plastomics, saturation, transcriptomics