J Syst Evol

• Research Article •    

Deep learning for Fabaceae identification by integrating molecular and morphological data and a solution for barcode selection

Kawtar Lhayani1*, Karim Rabeh2, and Leila Medraoui   

  1. ¹ Microbiology & Molecular Biology Team Research Center on Plant and Microbial Biotechnologies, Biodiversity and Environment Faculty of Sciences, Mohammed V University in Rabat, Morocco.
    ² Oasis Systems Research Unit, Regional Center of Agricultural Research of Errachidia, National Institute of Agricultural Research, Rabat, PO. Box 415, 10090, Morocco.


    *Author for correspondence. E–mail: kawtar_lhayani@um5.ac.ma

  • Received:2025-03-17 Accepted:2025-08-19 Online:2025-10-29 Published:2025-08-19

Abstract: Identification of Fabaceae family plants traditionally relies on either morphological traits or DNA barcoding, each with limitations in accuracy and efficiency. Deep learning has emerged as a promising tool for integrating multiple data sources, but its full potential remains underexplored. This study aimed to utilize a deep learning model that integrates morphological and molecular data for species identification within the Fabaceae family, bridging the gap between the two methods of identification. The research involved four main phases: (i) data collection; (ii) data preprocessing; (iii) training and testing the model; and (iv) analysis of results. The data comprised DNA barcode sequences retrieved from the BOLD database, and images were collected from different websites. The model was trained for identification on the genera and species levels, with two different barcodes: ITS2 and matK+rbcL. Only species with four available copies of ITS2, matK, and rbcL sequences were selected to ensure consistent input across samples. The final data set included seven genera and 21 species. While the model achieved high accuracy during training, test accuracy remained low (14%–19%), indicating overfitting, likely due to the limited data set size. However, the model demonstrated the ability to evaluate barcode discrimination across genera. Specifically, it highlighted ITS2 and matK+rbcL as having varying levels of effectiveness depending on the genus. These findings introduce a new application for deep learning in plant systematics not only for species identification but also for evaluating barcodes. This approach could help reduce the reliance on trial-and-error in barcode selection and enhance the efficiency of molecular taxonomy.

Key words: deep learning, DNA barcode, Fabaceae, morphological data, species identi?cation