Download Full Text (1.6 MB)

Publication Date



With the development of next generation sequencing technologies, short-read RNA sequencing became the standard method for rapidly sequencing whole genomes, annotating transcriptomes and quantifying gene expression. However, short-read RNA sequencing can be bioinformatically challenging because the full transcript is inferred from 150 base pair fragments, either by overlapping the sequenced fragments (de novo) or by aligning to a reference genome or transcriptome. The inference of the transcript de novo or through alignment to a reference set make short-read RNA sequencing less than ideal to use in identifying and characterizing the biological diversity of transcripts (isoforms). Recently, full-length (FL) RNA sequencing has been developed in which a single molecule of up to 10 kilobase can be sequenced with high confidence, removing the need to infer the transcript from short fragments. This new approach gives us the ability to discover isoforms resulting from alternative splicing or gene fusion events, as well as detect allele-specific expression and single nucleotide variants. Here, we leverage Iso-seq, the FL RNA sequencing method developed by Pacific Biosciences, to build an atlas of isoforms found in a population (N=81) of pediatric patients with rare diseases across 13 different tissue types. We identified ~2.7 million non-redundant transcripts across ~113 thousand genes. Only 31% of the transcripts were classified as originating from known genes, which means that 69% of the transcripts identified are not in the reference annotation of the human genome. From our preliminary investigation, approximately 47% of transcripts were identified as anti-sense, meaning that the sequenced transcript originated from the non-template strand of the genic region. These observations suggest that there is more biological diversity than short-read sequencing is able to detect. While there is a variety of human-related atlases that are publicly available, our study is the first to undertake the effort to catalog the full complement of isoforms in pediatric patients. Furthermore, we aim to identify tissue-specific isoforms, and highlight the biological diversity of isoforms across tissues.


Presented on behalf of the Grundberg Lab and Pastinen Lab

Document Type


Development of an Isoform Atlas in Pediatric Patients with Rare Diseases using Iso-seq