当前位置:科学网首页 > 小柯机器人 >详情
科学家实现高质量二倍体人类参考基因组的半自动组装
作者:小柯机器人 发布时间:2022/10/23 14:51:11

美国洛克菲勒大学Erich D. Jarvis等研究人员合作实现高质量二倍体人类参考基因组的半自动组装。这一研究成果于2022年10月19日在线发表在国际学术期刊《自然》上。

研究人员确定了目前的基因组测序和组装方法的组合可以产生最完整和最准确的二倍体基因组组装,而且人工整理的工作量最小。在组装过程中使用高度精确的长读数和亲子数据,并使用基于图形的单倍型相位的方法优于那些不使用的方法。通过对表现最好的方法进行组合,研究人员产生了第一个高质量的二倍体参考组装,每条染色体平均只包含大约四个缺口,大多数染色体的长度在CHM13的±1%以内。近48%的蛋白质编码基因在单倍型之间有非同义氨基酸变化,中心区显示出最高的多样性。这些研究结果可作为规模化组装近乎完整的二倍体人类基因组的基础,用于泛基因组参考来捕捉从单个核苷酸到结构重排的全球遗传变异。

据介绍,目前的人类参考基因组,GRCh38,代表了20多年来为产生一个高质量的组合所做的努力,这已经使社会受益。然而,它仍然有许多空白和错误,并不代表一个生物基因组,因为它是由多个个体混合而成的。最近,一个高质量的端粒对端粒参考(CHM13)是用最新的长读技术产生的,但是它来自一个几乎是同源基因组的葡萄胎细胞系。为了解决这些局限性,人类泛基因组参考联盟为此成立,目标是为代表人类遗传多样性的泛基因组参考创建高质量、高性价比的二倍体基因组组合。

附:英文原文

Title: Semi-automated assembly of high-quality diploid human reference genomes

Author: Jarvis, Erich D., Formenti, Giulio, Rhie, Arang, Guarracino, Andrea, Yang, Chentao, Wood, Jonathan, Tracey, Alan, Thibaud-Nissen, Francoise, Vollger, Mitchell R., Porubsky, David, Cheng, Haoyu, Asri, Mobin, Logsdon, Glennis A., Carnevali, Paolo, Chaisson, Mark J. P., Chin, Chen-Shan, Cody, Sarah, Collins, Joanna, Ebert, Peter, Escalona, Merly, Fedrigo, Olivier, Fulton, Robert S., Fulton, Lucinda L., Garg, Shilpa, Gerton, Jennifer L., Ghurye, Jay, Granat, Anastasiya, Green, Richard E., Harvey, William, Hasenfeld, Patrick, Hastie, Alex, Haukness, Marina, Jaeger, Erich B., Jain, Miten, Kirsche, Melanie, Kolmogorov, Mikhail, Korbel, Jan O., Koren, Sergey, Korlach, Jonas, Lee, Joyce, Li, Daofeng, Lindsay, Tina, Lucas, Julian, Luo, Feng, Marschall, Tobias, Mitchell, Matthew W., McDaniel, Jennifer, Nie, Fan, Olsen, Hugh E., Olson, Nathan D., Pesout, Trevor, Potapova, Tamara, Puiu, Daniela, Regier, Allison, Ruan, Jue, Salzberg, Steven L., Sanders, Ashley D., Schatz, Michael C., Schmitt, Anthony, Schneider, Valerie A., Selvaraj, Siddarth, Shafin, Kishwar, Shumate, Alaina, Stitziel, Nathan O., Stober, Catherine, Torrance, James, Wagner, Justin, Wang, Jianxin, Wenger, Aaron, Xiao, Chuanle, Zimin, Aleksey V., Zhang, Guojie, Wang, Ting

Issue&Volume: 2022-10-19

Abstract: The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

DOI: 10.1038/s41586-022-05325-5

Source: https://www.nature.com/articles/s41586-022-05325-5

期刊信息

Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:43.07
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html