Genomic technologies to improve variation identification in undiagnosed diseases

  • Joseph T.C. Shieh
    Corresponding author.
    University of California San Francisco, Division of Medical Genetics, Department of Pediatrics, Institute for Human Genetics, Benioff Children's Hospital, University of California San Francisco, California, USA
    Search for articles by this author
Open AccessPublished:November 09, 2022DOI:
      Human genome variation has increasingly posed challenges and opportunities for patients, medical providers, and an increasing group of stakeholders including advocacy groups, disadvantaged communities, public health experts, and scientists. Here, advances in genomic sequencing and mapping technologies are discussed with particular attention to the increasing ability to detect personal and population genome variation and the potential for accurate integration of variation into health and disease-related care. Genome mapping, one technique used to create genome map scaffolds, has now been combined with long read sequencing. New technologies have led to improved variation detection, including cryptic structural variation and diverse variants with different degrees of disease association. Combined with advances in automated and medical interpretations, variation detection is increasingly being applied in healthcare. These advances promise to make disease diagnostics more rapid, and potentially more accessible, to those with medical needs. Consequentially, the need for medical genetics and genomics experts is increasing. Here, the opportunities and potential challenges for application of genome-scale variation detection in disease are examined. (<300 words).

      Key Words

      1. Genomic technologies with increased sensitivity to detect variation

      While classical mapping and sequencing technologies have existed for some time, technologies that interrogate the genome are advancing rapidly. Thus, current genomic technologies are high-throughput data generators, with more variant detection, long chromosomal phasing and increasing molecular resolution. Although traditional genomic sequencing pipelines excel at identifying small nucleotide variants, the ability to comprehensively detect all copy number variants (structural variation) has been more limited. Genome analysis pipelines can have limitations in resolution, leaving gaps in genome variation detection. Variants can be too large to detect using shorter read sequencing and yet too small to be seen by classic array technology and cytogenetic visualization. Advances in both sequencing platforms and software pipelines have emerged to address these limitations. First, long read genome sequencing provides an improved opportunity for copy number variant detection. Recent studies have demonstrated its utility in both small and large variant detection. In addition, the combination of long read sequencing with genome mapping technologies, such as optical mapping, have led to both efficient chromosome level assembly and sequence-level resolution (Levy-Sakin, Mostovoy, Shieh). Optical mapping relies on labeling of genomic markers, allowing for single molecule assembly or alignment, which can be used for scaffolding and more sensitive detection of inversions and mid-sized insertions (Xiao, Wong). With longer reads and new mapping techniques, phasing, or the ability to determine one chromosome from another, is now an important tool (Uppuluri). The cis or trans nature of variants is of particular importance in medical genetics or for applications for other diploid systems. Phasing can even replace parental samples in some cases of recessive disorders, if parental testing is not possible. With polyploidy or with repetitive regions of the genome, these are increasingly being resolved by these technologies (Wong, Young, Gamba). Given the increase in options for generating more complete genomic information, it is important to consider the technologies and their potential application.

      2. Detecting different types of variants simultaneously

      Disease annotation databases and genomic disease nomenclature continue to evolve as the number of diseases and genes associated with diseases increase (Biesecker). Yet, there are still patients with undiagnosed diseases. Both increased awareness of genetic disorders among clinicians as well as better tools for annotating and narrowing differential diagnoses are needed. In some cases, particularly when the differential diagnosis based on the clinical presentation suggests copy number variation or repeat variants, new technologies that detect multiple types of variation are helpful. For example, genome sequencing, with the appropriate pipeline, can resolve areas of high sequence homology or sequence repeats such as in fragile X, specific ataxias, or spinal muscular atrophy. Furthermore techniques such as optical mapping can be used to detect chromosome-scale translocations, inversions, and copy number variants (Hanlon). Uniparental disomy, known to occur at some frequency in the unselected population, can also be detected and is important to identify in imprinting-region conditions including Prader Willi. Methylation, also important in imprinting, can be assessed by some long reach technologies. Optical mapping is useful in prenatal or pre-implantation contexts as it can be used to detect aneuploidy, translocation and large copy number variation simultaneously. Cancer diagnostic labs commonly assess breakpoint detection and fusions. These may also be amenable to the new technologies. In cancer and cancer-predisposition conditions, tissue diagnostics as well as germline diagnostics have been important in patient stratification, surveillance and treatment, as well as in understanding mechanisms of disease. For example, somatic homozygosity is increasingly being recognized in tumor suppressors and in cancer-like states such as neurofibromatosis and vascular overgrowth (Tong). Higher-depth genomic sequencing technologies have also been applied to detect variants with smaller allele fractions. In cancer diagnostics, this may be particularly important to detect tumor evolution and potential therapeutic resistance. Even in normal phenotypic states the identification of constitutional mosaicism is increasing with these new technologies. Thus, the application of these new technologies in both cancer and constitutional conditions is revealing heterogeneity in genomic composition and permitting characterization of the underlying biology of variation (Kumar, Snellings). This is an important first step towards understanding the implications of germline and somatic mosaicism in normal and in disease states. The volumes of sequence data now available however have led to more unsolved questions in biology. For example, when recurrent de novo variants are observed in disease: Why are specific hotspots observed (e.g. in Myhre syndrome or in ferritinopathies associated neurologic disease). Is recurrent variation due to intrinsic sequence or structure? Or are these related to positive selection? Similarly, for small variants the mechanisms remain relatively unexplored despite their increasing frequent description. The integration of haplotype information from sequence data may lead to further answers.

      3. Genome interpretation and importance of variant databases

      An individual's genomic information is most useful when coupled to gene and variant level annotation. Population-based genome databases and laboratory-submitted variant annotations are two sources that have contributed substantially to the understanding of variants and their potential effects. As aggregate data on genome variation coupled with phenotypic information increases, both for small variants and copy number variants, it should lead to better medical interpretation and widespread applicability (Hsueh). For example, population variation data indicate the observed frequency of missense variants for a particular gene, while laboratory-reported variants indicate potential purported significance of specific variants. These genomic resources have facilitated the development of automated or semi-automated genome variation annotation pipelines, which have improved speed in clinical applications. One example is hospital-based rapid genome sequencing, where diagnostic variants are being identified in hours or days (e.g. intensive care unit sequencing or rapid sequencing for undiagnosed diseases), whereas more traditional pipelines with manual annotation can take significantly longer.
      Generation and annotation of data is not, by itself, sufficient for medical application. Its diagnostic use requires molecular and medical expertise, integration of clinical information, and often iterative phenotyping. Although universal access to technology and complete genomic databases are clear challenges, solving these will only provide one part of a genomic-based health care system. Equally important are clinical sequence experts and front-line clinicians who are versed in genomic medicine. Indeed, there is an apparent shortage of such experts. Although health care entities and governmental institutions are attempting to increase the workforce, they are challenged by an evolving field that touches on both the preventative and treatment-based aspect of medical care. Those working in genomic medicine and with the technologies promise to address both aspects. The realization is in its early stages.

      4. Opportunities to integrate genomic with additional technologies

      How genomic alterations lead to diverse transcriptome effects is an important outstanding question that may require further -omics level characterization. Single cell RNA sequencing technologies (scRNA) are a potentially powerful tool to help address this question. Initial scRNA technologies relied on 5′ or 3'sequence, which could be used for rapid, inexpensive, survey of short transcript sequences, ideal for highly-expressed genes and cell type description (Wang). Longer RNA sequencing technologies are increasingly being used, with full-length transcriptomes and detection of rare transcripts that were previously missed by other technologies. More complete transcriptomics at single cell resolution may be crucial to understanding genomic variation and cellular diversity. To realize this goal, further work is needed to integrate DNA and RNA technologies at single cell resolution. While these new technologies provide many novel opportunities, several challenges remain. Increasing investment in genomics, data science and relevant expertise will move these technologies toward routine practical diagnostics.

      5. Genomics and the future

      Genome-based medical applications are often divided into two areas: rare diseases and common diseases. The division between these areas may be more historical and based on technological limitations at the time. Feasible disease mapping led to Mendelian disease studies, and SNP-association led to common disease variation studies. Increasingly, these areas of study are merging. Mendelian disease is clearly more complex than the assessment of binary traits, and genes involved in common diseases are also found in Mendelian forms. Therefore, biobanking efforts that include both genomic rare and common variants, as well as detailed trait information have had increasing utility (Wei). Indeed, knowledge of modifiers in Mendelian traits and polygenic risk scores have been active areas of research, and these may lead to further statistical methods for association or to quantify risk (Blair, Wei). Given the increasing relevance of genomics, educational institutions need to teach and train individuals in precision medicine opportunities and complexities. Practitioners in all medical disciplines need exposure to the complexities and evolving nature of genomics. Certain specialties may become increasingly dependent on molecular testing-based or informatic-based diagnostics (Chunduru, Innes). For this transition to occur there needs to be an integration of genomics and data science with medical expertise. Similar to the integration of radiology and imaging technology, this integration will transform medicine. With thousands of genes associated with diseases, rare diseases have become collectively common, and common disease genetics are continuing to evolve. Increasingly the challenges in the field are also relevant for front-line practitioners as they may implement technologies or work with referral care centers. The equitable availability and implementation of genomics and genomics-based care is a challenge for healthcare systems with primary/tertiary care integrative efforts and distribution of health resources by geography (Penon-Portmann).
      Beyond diagnosis and disease stratification, the question remains how quickly will these advances lead to new therapeutics for a growing list of disorders. Although the diagnostic and treatment technologies have great potential, their possibilities are still constrained by our knowledge of the effects of variants on cells and organ systems. While genomic editing technologies such as CRISPR have been proven in a cellular context, their application for diverse disease states requires careful disease selection and safe and effective in vivo delivery systems. Furthermore, genomic editing technologies require a solid understanding of whether normal or variant alleles require up- or downregulation, or correction, and how modulation and dosage affect the cell and organ systems. Whether these challenges can be overcome while still ensuring equity in healthcare delivery is still a concern. Indeed, since the initial completion of the first genome maps, the field has realized that the diversity of genomes from across populations is important in understanding health and disease. Whether it can also promote equity is a question. Although genomics has grown in select healthcare systems and in individual regions and countries, many populations still have limited access to basic genetic diagnostics, risk prediction and preventative risk-based care. These limitations have been observed in both high and low socioeconomic countries with disparities. Therefore, one goal of precision medicine should be to improve equitable access to genomics and genomics-based care. These challenges are therefore of increasing relevance to health care systems and economies.

      Declaration of competing interest

      The author has no conflicts of interest to declare.


      We thank NORD (National Organization for Rare Disorders) and the NEHI Reseaerch Foundation.