J Syst Evol

• Research Article •     Next Articles

Deep learning for Fabaceae identification by integrating molecular and morphological data and a solution for barcode selection

Kawtar LHAYANI¹*, Karim RABEH², Leila MEDRAOUI¹   

  1. ¹ Microbiology & Molecular Biology Team Research Center on Plant and Microbial Biotechnologies, Biodiversity and Environment Faculty of Sciences, Mohammed V University in Rabat, Morocco.
    ² Oasis Systems Research Unit, Regional Center of Agricultural Research of Errachidia, National Institute of Agricultural Research, Rabat, PO. Box 415, 10090, Morocco.


    *Author for correspondence. E–mail: kawtar_lhayani@um5.ac.ma

  • Received:2025-03-17 Accepted:2025-08-19

Abstract: Identification of Fabaceae family Plants, traditionally relies on either morphological traits or DNA barcoding, each with limitations in accuracy and efficiency. Deep learning has emerged as a promising tool for integrating multiple data sources, but its full potential remains underexplored. This study aimed to utilize a deep learning model that integrates morphological and molecular data for species identification within Fabaceae family, bridging the gap between the two methods of identification. The research involved four main phases: (i) data collection, (ii) data preprocessing, (iii) training and testing the model, and (iv) results analysis. The data comprised DNA barcode sequences retrieved from the BOLD database, and images were collected from different websites. The model was trained for identification on the genera and species level, with two different barcodes, ITS2 and matK+rbcL. Only species with four available copies of ITS2, matK, and rbcL sequences were selected to ensure consistent input across samples.  The final dataset included 7 genera and 21 species. While the model achieved high accuracy during training, test accuracy remained low (14–19%), indicating overfitting, likely due to the limited dataset size. However, the model demonstrated the ability to evaluate barcode discrimination across genera. Specifically, it highlighted ITS2 and matK+rbcL as having varying levels of effectiveness depending on the genus. These findings introduce a new application for deep learning in plant systematics not only for species identification but also for evaluating barcodes. This approach could help reduce the reliance on trial-and-error in barcode selection and enhance the efficiency of molecular taxonomy. 

Key words: Deep Learning, DNA barcode, Fabaceae, Morphological data, Species identification