Document Type

Article

Publication Date

4-8-2026

Identifier

DOI: 10.1016/j.xgen.2025.101104; PMCID: PMC13069850

Abstract

We developed a benchmark set of subclonal variants in the Genome in a Bottle (GIAB) Consortium HG002 reference material (RM) DNA for evaluating lower-frequency variant callsets. We used a somatic variant caller with high-coverage (300×) whole-genome sequencing data from the GIAB Ashkenazi Jewish trio to identify potential subclonal variants in the HG002 RM DNA. Using orthogonal sequencing data and manual curation, we defined a benchmark set with 85 high-confidence subclonal single-nucleotide variants (SNVs) (allele frequency [AF] > 5%) and a benchmark region covering 2.45 Gbp of the autosomes. External validation supported that it can be used to reliably identify both false negatives and false positives for a variety of sequencing technologies and variant callers. By adding our characterization of mosaic SNVs in this widely used cell line, we have expanded the scope of bioinformatic and sequencing applications for which the HG002 GIAB RM can be used to include benchmarking subclonal SNVs.

Journal Title

Cell Genom

Volume

6

Issue

4

First Page

101104

Last Page

101104

MeSH Keywords

Humans; Polymorphism, Single Nucleotide; Genome, Human; Benchmarking; Whole Genome Sequencing; Gene Frequency; High-Throughput Nucleotide Sequencing; Reference Standards; Computational Biology; Sequence Analysis, DNA

PubMed ID

41421359

Keywords

Genome In A Bottle; SNV; genome sequencing; mosaic variant; reference material; somatic mosaicism; somatic variant; variant benchmarking; variant calling

Comments

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Publisher's Link: https://doi.org/10.1016/j.xgen.2025.101104

Share

COinS