Publication Date




Download Full Text (739 KB)


Introduction: Genomic technologies continue to advance at a rapid rate, leading to continued novel gene-disease discoveries. However, despite the exponential increase in new gene discoveries, diagnostic rates in rare disease continue to range from 30-50%. To evaluate the impact of long read genome sequencing (lrGS) in a rare disease cohort, lrGS was implemented systematically in an institution-wide research program, Genomic Answers for Kids (GA4K). Methods: Individuals enrolled in GA4K, with a suspected genetic disorder, that remained undiagnosed after exome or genome sequencing, were submitted for HiFi sequencing. Probands were sequenced to a target depth of 30X coverage. Analyses included copy number, structural variation, single nucleotide variation, repeat expansion, and for a subset of genomes 5-methyl C detection. Clinical variants previously reported were used to assess lrGS variant detection algorithms. Additionally, sensitivity and specificity for lrGS were calculated by comparison to an Infinium Global Screening microarray. Results: As we have previously demonstrated, lrGS sensitivity and specificity for SNVs were slightly higher than short read genome sequencing (srGS), at 99%. Additionally, lrGS continued to identify ~150 novel rare variants impacting a coding gene (MAF <0.01%) compared to srGS. Increased coverage and phasing resulted in the detection of variants previously uncalled in sr sequencing, and phasing of variants in singletons, confirming molecular diagnoses. Given the previously demonstrated accuracy of SNV, we next focused analyses on more complex variation, not readily detectable by srGS. Approximately 39% of samples initially screened positive for a potential pathologic expansion (n=59 genes), with filtering criteria maximized for sensitivity. After interpretation, which includes examination of the repeat motif and structure, ~1.6% were considered to be pathogenic alleles, highlighting the importance of sequencing suspected expansions in large cohorts in addition to sizing. When SV/CNV are limited to variants at less than 5% frequency that impact a coding region (CCDS), there are 17.8 variants/genome, of these on average 4 overlap OMIM CCDS. Beyond characterization of coding impact, the nature of SV/CNV allows determination of orientation of duplications (on average 3 rare CCDS duplications per genome) as well as superior detection of infrequent inversions (one in six genomes has CCDS impacting inversion) as compared to other sequencing approaches. Direct 5-methyl-C detection (5mC-HiFi-GS) has been completed in 380 genomes and focusing on rare (< 0.5% population frequency) gene proximal (5’) hypermethylation suggestive of “promoter silencing”, we observed on average 51 such alleles per patient (13 in OMIM genes). To date, two of the OMIM promoter hypermethylation events from 5mC-HiFi-GS are linked to previously undetected pathogenic repeat expansions, but many others are proximal to novel unstable repeats and other non-coding rare variants with potential function. In parallel, the rare methylation signatures faithfully recapitulate previously known disease variant linked epigenetic pertubations (e.g. DM1). Conclusions: The implementation of lrGS in an ES/GS negative cohort resulted in an approximate 10% increase in diagnostic yield. Importantly, previously reported variants were recapitulated, indicating that lrGS could be utilized as a first-tier genome test, simplifying genetic testing algorithms and increasing efficiency. Our developing catalog of rare SVs and methylation variants are now giving new handles for unsolved disease in known and novel disease genes. Anticipated improvements in throughput and cost will enable the widespread integration of long read sequencing into clinical care.


Medical Genetics


Presented at the Annual Clinical Genetics Meeting; Salt Lake City, UT; March 14-18, 2023.

Evaluating the Impact of Long Read Genomes in Rare Disease: A systematic analysis of 1000 HiFi Genomes