SV Annotation Table Columns

This document describes each column in the SV Annotation Table.

SV Name: name of the SV.

Chromosome: chromosome (chr1 to chr22, plus chrX, chrY, and chrM).

Start: SV start.

End: SV end.

Type: SV types recognized by the pipeline are: duplications, deletions, inversions, and insertions. Insertions of size 1 are padded (+/- 50bp) prior to frequency calculations. BND are currently ignored.

Length: length of the SV.

GC (%): GC content based on the UCSC genome reference GRCh38.

Cytoband: cytoband.

Gene Count: number of official gene symbol(s) for genes spanned by the SV based on the UCSC RefSeq gene definitions.

Gene Name(s): official gene symbol(s) for genes spanned by the SV based on the UCSC RefSeq gene definitions.

Gene at Start: official gene symbol(s); overlap on SV start coordinate.

Gene at End: official gene symbol(s); overlap on SV end coordinate.

Exon Name: official gene symbol(s); exons overlap only.

CDS Name: official gene symbol(s); coding exons overlap.

Dark Genes % Overlap: % overlap with the dark regions from Twist Alliance.

ClinGen Haploinsufficient: (array of) entrez gene ID, gene symbol, and score from the dosage sensitivity map, haploinsufficient phenotype defined in ClinGen.

ClinGen Triplosensitive: (array of) entrez gene ID, gene symbol, and score for dosage sensitivity map, triplosensensitive phenotype defined in ClinGen.

gnomAD O/E LoF Upper: observed/expected upper bound loss of function from gnomAD.

gnomAD O/E Mis Upper: observed/expected upper bound missense from gnomAD.

gnomAD pLI: probability that a gene falls into the class of intolerant of a single LoF gene (LoF- haploinsufficient intolerant genes), from gnomAD.

gnomAD pRec: probability that a gene falls into the class of intolerant of two LoF genes (recessive genes), from gnomAD.

Repeat % Overlap: percent overlap with repeat regions (RepeatMasker annotation from UCSC).

Dirty Region % Overlap: percent overlap with gaps (including centromeres and telomeres), and segmental duplications

Chromosome Region: telomere/centromere tag.

CGD: (array of) entrez gene ID, gene symbol, disease name(s), and inheritance found in the Clinical Genomics Database; it is compiled by curators and maintained by the NHGRI (National Human Genome Research Institute); for every gene in the database, the CGD provides a list of one or more genetic disorders and a mode of inheritance (AD, AR, AD/AR, XL, more complex modes); since the CGD mode of inheritance is directly added by a curator and is tied to specific genetic disorder(s), it could be considered more accurate than the mode of inheritance for top-level HPO phenotypes.

OMIM Pheno: (array of) entrez gene ID, gene symbol, and disorder/disease name(s), with their respective inheritance modes coded as “i:”, found in OMIM.

OMIM Inh: list of all the inheritance modes for OMIM phenotypes.

autosomal dominant (AD), autosomal recessive (AR), multifactorial (MF), Y-linked (YL), X-linked dominant (XD), X-linked recessive (XR), X-linked: (XL), digenic recessive (DR), mitochondrial (MT), somatic mutation (SMu), somatic mosaicism (SMo), inherited chromosomal imbalance (ICI)

ClinGen Region: Genomic disease region from ClinGen.

Decipher Region: Genomic disease region from Decipher; the DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. For more information, see: https://decipher.sanger.ac.uk/.

ClinVar VarID: ClinVar identifier.

gnomAD AF Max 90% RO: highest allele frequencies among the subpopulation reported by gnomAD, with at least 90% reciprocal overlap.

gnomAD Population AF Max 90% RO: subpopulation with the highest allele frequencies reported by gnomAD, with at least 90% reciprocal overlap.

gnomAD Hom/Ref Frequency 90% RO: homozygous reference genotype frequency reported by gnomAD, with at least 90% reciprocal overlap.

gnomAD Het Frequency 90% RO: heterozygous genotype frequency reported by gnomAD, with at least 90% reciprocal overlap.

gnomAD Hom/Alt Frequency 90% RO: homozygous alternate genotype frequency reported by gnomAD, with at least 90% reciprocal overlap.

DGV % Overlap: % length of SV overlapped by DGV region(s) (no cutoff used). The Database of Genomic Variants provides a comprehensive summary of structural variation in the human genome. For more information: http://dgv.tcag.ca/dgv. The DGV coordinates were lifted over to obtain the corresponding intervals in the GRCh38 reference genome.

DGV 50% RO: % length covered by merged variants in DGV, restricted to those with at least 50% reciprocal overlap.