Deep learning for Fabaceae identification by integrating molecular and morphological data and a solution for barcode selection

Abstract

Abstract: Identification of Fabaceae family Plants, traditionally relies on either morphological traits or DNA barcoding, each with limitations in accuracy and efficiency. Deep learning has emerged as a promising tool for integrating multiple data sources, but its full potential remains underexplored. This study aimed to utilize a deep learning model that integrates morphological and molecular data for species identification within Fabaceae family, bridging the gap between the two methods of identification. The research involved four main phases: (i) data collection, (ii) data preprocessing, (iii) training and testing the model, and (iv) results analysis. The data comprised DNA barcode sequences retrieved from the BOLD database, and images were collected from different websites. The model was trained for identification on the genera and species level, with two different barcodes, ITS2 and matK+rbcL. Only species with four available copies of ITS2, matK, and rbcL sequences were selected to ensure consistent input across samples. The final dataset included 7 genera and 21 species. While the model achieved high accuracy during training, test accuracy remained low (14–19%), indicating overfitting, likely due to the limited dataset size. However, the model demonstrated the ability to evaluate barcode discrimination across genera. Specifically, it highlighted ITS2 and matK+rbcL as having varying levels of effectiveness depending on the genus. These findings introduce a new application for deep learning in plant systematics not only for species identification but also for evaluating barcodes. This approach could help reduce the reliance on trial-and-error in barcode selection and enhance the efficiency of molecular taxonomy.

Key words: Deep Learning, DNA barcode, Fabaceae, Morphological data, Species identification

Kawtar LHAYANI, Karim RABEH, Leila MEDRAOUI. Deep learning for Fabaceae identification by integrating molecular and morphological data and a solution for barcode selection[J]. J Syst Evol.

Add to citation manager EndNote|Ris|BibTeX

URL: https://www.jse.ac.cn/EN/

[1]	Oyetola Oyebanji, Gregory W. Stull, Rong Zhang, Fabien R. Rahaingoson, De-Zhu Li, and Ting-Shuang Yi. Molecular phylogeny and spatio-temporal diversification of the Millettioid/Phaseoloid clade (Fabaceae: Papilionoideae) [J]. J Syst Evol, 2024, 62(6): 1103-1117.
[2]	Xiao‐Kai Fan, Jing Wu, Hans Peter Comes, Yu Feng, Ting Wang, Shu‐Zhen Yang, Takaya Iwasaki, Hong Zhu, Yun Jiang, Joongku Lee, and Pan Li. Phylogenomic, morphological, and niche differentiation analyses unveil species delimitation and evolutionary history of endangered maples in Acer series Campestria (Sapindaceae) [J]. J Syst Evol, 2023, 61(2): 284-298.
[3]	Lu‐Xian Liu, Pan Deng, Meng‐Zhen Chen, Li‐Min Yu, Joongku Lee, Wei‐Mei Jiang, Cheng‐Xin Fu, Fu‐De Shang, and Pan Li. Systematics of Mukdenia and Oresitrophe (Saxifragaceae): Insights from genome skimming data [J]. J Syst Evol, 2023, 61(1): 99-114.
[4]	Wen-Pan Dong, Jia-Hui Sun, Yan-Lei Liu, Chao Xu, Yi-Heng Wang, Zhi-Li Suo, Shi-Liang Zhou, Zhi-Xiang Zhang, and Jun Wen. Phylogenomic relationships and species identification of the olive genus Olea (Oleaceae) [J]. J Syst Evol, 2022, 60(6): 1263-1280.
[5]	Mao-Qin Xia, Ren-Yu Liao, Jin-Ting Zhou, Han-Yang Lin, Jian-Hua Li, Pan Li, Cheng-Xin Fu, and Ying-Xiong Qiu. Phylogenomics and biogeography of Wisteria: Implications on plastome evolution among inverted repeat-lacking clade (IRLC) legumes [J]. J Syst Evol, 2022, 60(2): 253-265.
[6]	Ming Qin, Cheng-Jie Zhu, Jun-Bo Yang, Mohammad Vatanparast, Rowan Schley, Qiang Lai, Dan-Yan Zhang, Tie-Yao Tu, Bente B. Klitgård, Shi-Jin Li, and Dian-Xiang Zhang. Comparative analysis of complete plastid genome reveals powerful barcode regions for identifying wood of Dalbergia odorifera and D. tonkinensis (Leguminosae) [J]. J Syst Evol, 2022, 60(1): 73-84.
[7]	Fei-Fei Wu, Qiu Gao, Fang Liu, Zan Wang, Jian-Li Wang, and Xian-Guo Wang. DNA barcoding evaluation of Vicia (Fabaceae): Comparative efficacy of six universal barcode loci on abundant species [J]. J Syst Evol, 2020, 58(1): 77-88.
[8]	W. John Kress. Plant DNA barcodes: Applications today and in the future [J]. J Syst Evol, 2017, 55(4): 291-307.
[9]	Robabeh Shahi Shavvon, Shahrokh Kazempour Osaloo, Ali Asghar Maassoumii, Farideh Moharrek, Seher Karaman Erkul, Alan R. Lemmon, Emily Moriarty Lemmon, Ingo Michalak, Alexandra N. Muellner-Riehl, Adrien Favre. Increasing phylogenetic support for explosively radiating taxa: The promise of high-throughput sequencing for Oxytropis (Fabaceae) [J]. J Syst Evol, 2017, 55(4): 385-404.
[10]	Wei Gong, Ying Liu, Jing Chen, Yu Hong, Hang-Hui Kong. DNA barcodes identify Chinese medicinal plants and detect geographical patterns of Sinosenecio (Asteraceae) [J]. J Syst Evol, 2016, 54(1): 83-91.
[11]	Erika N. Schwarz, Tracey A. Ruhlman, Jamal S. M. Sabir, Nahid H. Hajrah, Njud S. Alharbi, Abdulrahman L. Al-Malki, C. Donovan Bailey, Robert K. Jansen. Plastid genome sequences of legumes reveal parallel inversions and multiple losses of rps16 in papilionoids [J]. J Syst Evol, 2015, 53(5): 458-468.
[12]	Edson D. da Silva, Ana M. G. de A. Tozzi, Leonardo D. Meireles. Distribution of Leguminosae tree species in different altitudinal levels along the Atlantic Rain Forest in the Brazilian coast [J]. J Syst Evol, 2015, 53(3): 266-279.
[13]	Yun-Rui MAO, Yong-Hua ZHANG, Koh NAKAMURA, Bi-Cai GUAN, Ying-Xiong QIU. Developing DNA barcodes for species identification in Podophylloideae (Berberidaceae) [J]. J Syst Evol, 2014, 52(4): 487-499.
[14]	Jacira R. LIMA,Vidal F. MANSANO, Francisca S. ARAÚJO. Richness and diversity of Leguminosae in an altitudinal gradient in the tropical semi-arid zone of Brazil [J]. J Syst Evol, 2012, 50(5): 433-442.
[15]	Wen-Bin YU, Pan-Hui HUANG, Richard H. REE, Min-Lu LIU,De-Zhu LI, Hong WANG. DNA barcoding of Pedicularis L. (Orobanchaceae): Evaluating four universal barcode loci in a large and hemiparasitic genus [J]. J Syst Evol, 2011, 49(5): 425-437.

Deep learning for Fabaceae identification by integrating molecular and morphological data and a solution for barcode selection

HTML

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments