Genetic admixture of Chinese Tajik people inferred from genome‐wide array genotyping and mitochondrial genome sequencing

Jing Zhao1,2†, Qiao Wu1†, Xin-Hong Bai3, Edward Allen4, Meng-Ge Wang2, Guang-Lin He2, Jian-Xin Guo2, Xiao-Min Yang2, Jian-Xue Xiong4,5, Zi-Xi Jiang4,5, Xiao-Yan Ji4,5, Hui Wang4,5, Jing-Ze Tan1*, Shao-Qing Wen1,4,5*, and Chuan‐Chao Wang2,6,7,8*   

  1. 1 Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200433, China;
    2 Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen 361005, Fujian, China;
    3 The Shanghai Anthropological Association, Shanghai 200433, China;
    4 Institute of Archaeological Science, Fudan University, Shanghai 200433, China;
    5 Center for the Belt and Road Archaeology and Ancient Civilizations (BRAAC), Fudan University, Shanghai 200433, China;
    6 State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen 361102, Fujian, China;
    7 State Key Laboratory of Marine Environmental Science, Xiamen University, Xiamen 361102, Fujian, China;
    8 Institute of Artificial Intelligence, Xiamen University, Xiamen 361102, Fujian, China
    These authors contributed equally to this work.
    *Authors for correspondence. Jing‐Ze Tan. E‐mail: jztan@fudan.edu.cn; Shao‐Qing Wen. E‐mail: wenshaoqing@fudan.edu.cn; Chuan‐Chao Wang. E‐mail: wang@xmu.edu.cn
  • Received:2022-08-26 Accepted:2023-04-21 Online:2023-05-30

Abstract: Chinese Tajiks are an Indo-Iranian-speaking population in Xinjiang, northwest China. Although the complex demographic history has been characterized, the ancestral sources and genetic admixture of Indo-Iranian-speaking groups in this region remain poorly understood. We here provide the genome-wide genotyping data for over 700 000 single-nucleotide polymorphisms (SNPs) and mtDNA multiplex sequencing data in 64 Chinese male Tajik individuals from two dialect groups, Wakhi and Selekur. We applied principal component analysis (PCA), ADMIXTURE, f-statistics, treemix, qpWave/qpAdm, Admixture-induced Linkage Disequilibrium for Evolutionary Relationships (ALDER), and Fst analyses to infer a fine-scale population genetic structure and admixture history. Our results reveal that Chinese Tajiks showed the closest affinity and similar genetic admixture pattern with ancient Xinjiang populations, especially Xinjiang samples in the historical era. Chinese Tajiks also have gene flow from European and Neolithic Iran farmers-related populations. We observed a genetic substructure in the two Tajik dialect groups. The Selekur-speaking group who lived in the county had more gene flow from East Asians than Wakhi-speaking people who inhabited the village. These results document the population movements contributed to the influx of diverse ancestries in the Xinjiang region.

Key words: Chinese Tajiks, East Asia, genetic structure, population admixture, population history