西湖大学杨剑团队的一项最新研究显示,中国千例泛基因组为医学和群体遗传学赋能。相关论文发表在2026年4月1日出版的《自然》杂志上。
在这里,该课题组研究人员构建了1116个二倍体基因组组合(55个从头组合和1061个泛基因组组合),平均大小为2.98 Gb,平均质量值为46。基于这些组装,研究人员构建了一个泛基因组,其中包含当前参考基因组GRCh38和CHM13中缺失的4.053亿个碱基对序列,其中包括2,620万个碱基对的功能性基因区域和预测调控元件。研究人员对基因变异进行了全面分类,包括3,540万个小型变异、110,530个结构变异(SV)、485,575个串联重复序列(TR)以及嵌入非参考序列中的86万个嵌套变异。
这个广泛的数据集能够详细描述与医学遗传学相关的多尺度基因变异,包括基因改变的SVs、TR扩增、基因基因组变异和HLA基因单倍型。结合1KCP基因表达数据,课题组人员进行了泛变异表达数量性状定位(eQTL),分析了不同的变异类型。课题组研究人员鉴定了3256个涉及复杂变体(SVs、TRs和嵌套变体)的eQTL,并阐明了它们的调控复杂性。最后,该课题组开发了一个1KCP泛变异代入参考面板,该面板提供了多类型遗传标记,以提高未来关联研究的分辨率。这种抵抗促进了他们对复杂变异及其功能含义的理解,为人类健康提供了新的见解。
据了解,泛基因组学正在彻底改变其解决具有复杂变异的基因组区域的能力。然而,现有的人类泛基因组受样本量小的限制,在医学和群体遗传学应用方面的效用有限。
附:英文原文
Title: The 1000 Chinese Pangenome empowers medical and population genetics
Author: Wang, Yifei, Duan, Zhongqu, Chen, Dan, Shi, Dandan, Ding, Yi, Wang, Zhibin, Li, Baoqing, Wang, Zhiyi, Guo, Minmin, Yang, Wen, Hou, Junren, Chen, Wenhao, Guo, Yazhou, Wei, Wenjie, Cao, Yujie, Sun, Xiwei, Bai, Weiyang, Lu, Mingdong, Qi, Ting, Shen, Xian, Yang, Jian
Issue&Volume: 2026-04-01
Abstract: Pangenomes are revolutionizing our ability to resolve genomic regions with complex variations1. However, existing human pangenomes2,3, constrained by small sample sizes, provide limited utility for medical and population genetic applications. Here we generated 1,116 diploid genome assemblies (55 de novo and 1,061 pangenome-informed) with an average size of 2.98Gb and a mean quality value of 46 as part of the 1000 Chinese Pangenome (1KCP) project. On the basis of these assemblies, we constructed a pangenome comprising 405.3million base pairs of sequences absent from the current references GRCh38 and CHM13, including 26.2million base pairs of functional genic and predicted regulatory elements. We catalogued a full spectrum of genetic variation, including 35.4million small variants, 110,530 structural variants (SVs), 485,575 tandem repeats (TRs) and 0.86million nested variants embedded in non-reference sequences. This extensive dataset enabled detailed characterization of multiscale genic variations relevant to medical genetics, including gene-altering SVs, TR expansions, gene cluster variations and HLA gene haplotypes. Coupled with the 1KCP gene expression data, we conducted pan-variant expression quantitative trait locus (eQTL) mapping to analyse diverse variant types. We identified 3,256 eQTLs involving complex variants (SVs, TRs and nested variants) and elucidated their regulatory complexity. Finally, we developed a 1KCP pan-variant imputation reference panel, which provides multitype genetic markers to enhance the resolution of future association studies. This resource advances our understanding of complex variants and their functional implications to provide new insights into human health.
DOI: 10.1038/s41586-026-10315-y
Source: https://www.nature.com/articles/s41586-026-10315-y
Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html
