J Syst Evol

• Research Article • Previous Articles    

Phylogenomic data exploration with increased sampling provides new insights into the higher-level relationships of butterflies and moths (Lepidoptera)

Qi Chen1,2, Min Deng1,3, Xuan Dai1, Wei Wang4, Xing Wang1,2*, Liu-Sheng Chen5*, Guo-Hua Huang1*   

  1. 1Yuelushan Laboratory, College of Plant Protection, Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
    2Tropical Biodiversity and Bioresource Utilization Laboratory, College of Science, Qiongtai Normal University, Haikou 571127, China
    3Qiannan Polytechnic for Nationality, Duyun 558022, China
    4Research Center for Wild Animal and Plant Resource Protection and Utilization, Qiongtai Normal University, Haikou 571127, China
    5Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization, Guangdong Academy of Forestry, Guangzhou 510520, China

    *Authors for correspondence. Xing Wang. E-mail: xingwanghjt@163.com; Liu-Sheng Chen E-mail: lshchen2008@163.com; Guo-Hua Huang. E-mail: ghhuang@hunau.edu.cn
  • Received:2025-01-08 Accepted:2025-03-16
  • Supported by:
    This study was supported by the National Natural Science Foundation of China (32360134, 32111540167, 41661011) and China Agriculture Research System (CARS-23-C08).

Abstract: A robust and stable phylogenetic framework is a fundamental goal of evolutionary biology. As the third largest insect order, Lepidoptera (butterflies and moths) are central to terrestrial ecosystems and serve as important models for biologists studying ecology and evolutionary biology. However, for such an insect group, the higher-level phylogenetic relationships among its superfamilies remain poorly resolved. Here, we increased taxon sampling among Lepidoptera (37 superfamilies and 68 families containing 263 taxa), obtaining a series of amino-acid datasets from 69,680 to 400,330 aa in length for phylogenomic reconstructions. Using these datasets, we explored the effect of different taxon sampling with significant increases in gene loci on tree topology using maximum-likelihood (ML) and Bayesian inference (BI) methods. Moreover, we also tested the effectiveness of topology robustness among the three ML-based models. The results demonstrated that taxon sampling is an important determinant in tree robustness of accurate phylogenetic estimation for species-rich groups. Site-wise heterogeneity was identified as a significant source of bias, causing inconsistent phylogenetic positions among ditrysian lineages. The application of the posterior mean site frequency (PMSF) model provided reliable estimates for higher-level phylogenetic relationships of Lepidoptera. Phylogenetic inference presented a comprehensive framework among lepidopteran superfamilies, and newly revealed some sister relationships with strong supports (Papilionoidea is sister to Gelechioidea, Immoidea is sister to Galacticoidea, and Pyraloidea is sister to Hyblaeoidea, respectively). The current study provides essential insights for future phylogenomic investigations in species-rich lineages of Lepidoptera and enhances our understanding on phylogenomics of highly diversified groups.

Key words: benchmarking universal single-copy orthologs, butterflies and moths, Lepidoptera, phylogenomics, next-generation sequencing