当前位置:科学网首页 > 小柯机器人 >详情
DeepConsensus通过间隙感知序列转换器提高测序准确性
作者:小柯机器人 发布时间:2022/9/3 22:03:58

谷歌公司Andrew Carroll研究组近日取得一项新成果。他们研究发现DeepConsensus通过间隙感知序列转换器提高测序列准确性。2022年9月1日,国际学术期刊《自然-生物技术》发表了这一成果。

研究人员研发了DeepConsensus,它使用基于对齐的损失来训练间隙感知变压编码器以进行序列校正。与pbccs相比,DeepConsensus将读取错误减少了42%。这使PacBio HiFi读取的产量在Q20水平上提高了9%,在Q30水平上提高了27%,在Q40水平上提高了90%。

使用HG003的两个SMRT细胞,从DeepConsensus读取提高了hifiasm组装的连续性(NG50 4.9兆碱基 (Mb) 到17.2 Mb)、增加基因的完整性(94%到97%)、降低错误基因重复率(1.1%到0.5% )、提高装配基础精度(Q43到Q45)并将变体调用错误减少24%。DeepConsensus模型可利用针对分析其他类型序列比对的一般问题进行训练,例如唯一的分子标识符或基因组组装。

研究人员表示,使用Pacific Biosciences (PacBio)技术的循环共有测序通过将连续得到的DNA分子结果组合成共有序列,产生长(10–25 千碱基)、准确的“HiFi”读数。共识生成的标准方法pbccs使用隐马尔可夫模型。

附:英文原文

Title: DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

Author: Baid, Gunjan, Cook, Daniel E., Shafin, Kishwar, Yun, Taedong, Llinares-Lpez, Felipe, Berthet, Quentin, Belyaeva, Anastasiya, Tpfer, Armin, Wenger, Aaron M., Rowell, William J., Yang, Howard, Kolesnikov, Alexey, Ammar, Waleed, Vert, Jean-Philippe, Vaswani, Ashish, McLean, Cory Y., Nattestad, Maria, Chang, Pi-Chuan, Carroll, Andrew

Issue&Volume: 2022-09-01

Abstract: Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10–25kilobases), accurate ‘HiFi’ reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer–encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9megabases (Mb) to 17.2Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.

DOI: 10.1038/s41587-022-01435-7

Source: https://www.nature.com/articles/s41587-022-01435-7

期刊信息

Nature Biotechnology:《自然—生物技术》,创刊于1996年。隶属于施普林格·自然出版集团,最新IF:31.864
官方网址:https://www.nature.com/nbt/
投稿链接:https://mts-nbt.nature.com/cgi-bin/main.plex