J Syst Evol ›› 2017, Vol. 55 ›› Issue (4): 377-384.DOI: 10.1111/jse.12258

• Research Articles • Previous Articles     Next Articles

Machine learning algorithms improve the power of phytolith analysis: A case study of the tribe Oryzeae (Poaceae)

Zhe Cai1,2 and Song Ge1,2*   

  1. 1State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
    2University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2017-03-23 Published:2017-07-24

Abstract: Phytoliths, as one of the important sources of microfossils, have been widely used in paleobotany-related studies, especially in the grass family (Poaceae) where abundant phytoliths are found. Despite great efforts, several challenges remain when phytoliths are used in various studies, including the accurate description of phytolith morphology and the effective utilization of phytolith traits in taxon identification or discrimination. In this study, we analyzed over 1000 phytolith samples from 18 taxa representing seven main genera in the tribe Oryzeae (subfamily Ehrhartoideae) and five taxa in the subfamilies Bambusoideae and Pooideae. By focusing on Oryzeae, which has been extensively investigated in terms of taxonomy and phylogeny, we were able to evaluate the discrimination power of phytoliths at lower taxonomic levels in grasses. With the help of morphometric analysis and by introducing several machine learning algorithms, we found that 87.7% of the phytolith samples could be classified correctly at the genus level. In spite of slightly different performances, all four machine learning algorithms significantly increased the resolving power of phytolith evidence in taxon identification and discrimination compared with the traditional phytolith analysis. Therefore, we propose a pipeline of phytolith analyses based on machine learning algorithms, including data collection, morphometric analysis, model building, and taxon discrimination. The methodology and pipeline presented here should be applied to various studies across different groups of plants. This study provides new insights into the utilization of phytoliths in evolutionary and ecology studies involving grasses and plants in general.

Key words: machine learning, morphological character, phytolith, Poaceae, taxon discrimination