Variant Annotation Integrator (2024)

Introduction

The Variant Annotation Integrator (VAI) is a research tool for associatingannotations from the UCSC database with your uploaded set of variant calls.It uses gene annotations to predict functional effects of variants on transcripts.For example, a variant might be located in the coding sequenceof one transcript, but in the intron of an alternatively spliced transcriptof the same gene; the VAI will return the predicted functional effectfor each transcript. The VAI can optionally add several othertypes of relevant information: the dbSNP identifier if the variantis found indbSNP,protein damage scores for missense variants from theDatabase of Non-synonymous Functional Predictions (dbNSFP),and conservation scores computed from multi-species alignments.The VAI can optionally filter results to retain only specific functionaleffect categories, variant properties and multi-species conservation status.

NOTE:
The VAI is only a research tool, meant to be used by those who have beenproperly trained in the interpretation of genetic data,and should never be used to make any kind of medical decision.We urge users seeking information about a personal medical or geneticcondition to consult with a qualified physician for diagnosis and foranswers to personal questions.


Submitting your variant calls

In order to use the VAI, you must provide variant calls in either thePersonal Genome SNP (pgSnp) orVCF format.pgSnp-formatted variants may be uploaded as aCustom Track.Compressed and indexed VCF files must be on a web server (HTTP, HTTPS or FTP)and configured as Custom Tracks, or if you happen to have aTrack Hub,as hub tracks.

Protein-coding gene transcript effect predictions

Any gene prediction track in the UCSC Genome Browser database or in a track hubcan be selected as the VAI's source of transcript annotations for predictionof functional effects.Sequence Ontology (SO) terms are used to describe the effectof each variant on genes in terms of transcript structure as follows:

SO termdescription
intergenic_variantA sequence variant located in the intergenic region, between genes.
upstream_gene_variantA sequence variant located 5' of a gene. (VAI searches within 5,000 bases.)
downstream_gene_variantA sequence variant located 3' of a gene. (VAI searches within 5,000 bases.)
5_prime_UTR_variantA variant located in the 5' untranslated region (UTR) of a gene.
3_prime_UTR_variantA variant located in the 3' untranslated region (UTR) of a gene.
synonymous_variantA sequence variant where there is no resulting change to the encoded amino acid.
missense_variantA sequence variant, that changes one or more bases, resulting in adifferent amino acid sequence but where the length is preserved.
inframe_insertionAn inframe non synonymous variant that inserts bases into in the coding sequence.
inframe_deletionAn inframe non synonymous variant that deletes bases from the coding sequence.
frameshift_variantA sequence variant which causes a disruption of the translationalreading frame, because the number of nucleotides inserted or deleted is nota multiple of three.
initiator_codon_variantA codon variant that changes at least one base of the first codon of a transcript.
incomplete_terminal_codon_variantA sequence variant where at least one base of the final codon of anincompletely annotated transcript is changed.
stop_lostA sequence variant where at least one base of the terminator codon(stop) is changed, resulting in an elongated transcript.
stop_retained_variantA sequence variant where at least one base in the terminator codon ischanged, but the terminator remains.
exon_lossA sequence variant whereby an exon is lost from the transcript.(VAI assigns this term when an entire exon is deleted.)
stop_gainedA sequence variant whereby at least one base of a codon is changed,resulting in a premature stop codon, leading to a shortened transcript.
NMD_transcript_variantA variant in a transcript that is already the target of nonsense-mediated decay (NMD),i.e. stop codon is not in last exon nor within 50 bases of the end of the second-to-lastexon.
intron_variantA transcript variant occurring within an intron.
splice_donor_variantA splice variant that changes the 2-base region at the 5' end of an intron.
splice_acceptor_variantA splice variant that changes the 2 base region at the 3' end of an intron.
splice_region_variantA sequence variant in which a change has occurred within the regionof the splice site, either within 1-3 bases of the exon or 3-8 bases ofthe intron.
complex_transcript_variantA transcript variant with a complex insertion or deletion (indel) thatspans an exon/intron border or a coding sequence/UTR border.
non_coding_exon_variantA sequence variant that changes exon sequence of a non-coding gene.
no_sequence_alterationA variant that causes no change to the transcript sequence and/orspecifies only the reference allele, no alternate allele.In rare cases when the transcript sequence (e.g. from RefSeq) differs from thereference genome assembly, a difference from the reference genome may restorethe transcript sequence instead of altering it.

Optional annotations

In addition to protein-coding genes, some genome assemblies offer other sources ofannotations that can be included in the output for each variant.

Database of Non-synonymous Functional Predictions (dbNSFP)

dbNSFP annotations are available only for hg19/GRCh37 (dbNSFP release 2.0) andhg38/GRCh38 (release 3.1a).dbNSFP(Liu et al. 2011)provides pre-computed scores and predictions of functionalsignificance from a variety of tools. Every possible coding change totranscripts inGENCODE(for hg19: release 9, Ensembl 64, Dec. 2011; for hg38, release 22, Ensembl 79, Mar. 2015)gene predictions has been evaluated.dbNSFP includes only single-nucleotide missense changes;its data do not apply to indels, multi-nucleotide variants,non-coding or synonymous changes.

dbNSFP provides scores and predictions from several tools that use variousmachine learning techniques to estimate the likelihood that a single-nucleotidemissense variant would damage a protein's structure and function:

  • SIFT (Sorting Intolerant From Tolerant) uses sequence hom*ology and the physical properties of amino acids to predict whether an amino acid substitution affects protein function. Scores less than 0.05 are classified as Damaging ("D" in output); higher scores are classified as Tolerated ("T"). (Ng and Henikoff, 2003)
  • PolyPhen-2 (Polymorphism Phenotyping v2) applies a naive Bayes classifier using several sequence-based and structure-based predictive features including refined multi-species alignments. PolyPhen-2 was trained on two datasets, and dbNSFP provides scores for both. The HumDiv training set is intended for evaluating rare alleles potentially involved in complex phenotypes, for example in genome-wide association studies (GWAS). Predictions are derived from scores, with these ranges for HumDiv: "probably damaging" ("D") for scores in [0.957, 1]; "possibly damaging" ("P") for scores in [0.453, 0.956]; "benign" ("B") for scores in [0, 0.452]. HumVar is intended for studies of Mendelian diseases, for which mutations with drastic effects must be sorted out from abundant mildly deleterious variants. Predictions are derived from scores, with these ranges for HumDiv: "probably damaging" ("D") for scores in [0.909, 1]; "possibly damaging" ("P") for scores in [0.447, 0.908]; "benign" ("B") for scores in [0, 0.446]. (Adzhubei et al., 2010)
  • MutationTaster applies a naive Bayes classifier trained on a large dataset (>390,000 known disease mutations from HGMD Professional and >6,800,000 presumably harmless SNP and Indel polymorphisms from the 1000 Genomes Project). Variants that cause a premature stop codon resulting in nonsense-mediated decay (NMD), as well as variants marked as probable-pathogenic or pathogenic in ClinVar, are automatically presumed to be disease-causing ("A"). Variants with all three genotypes present in HapMap or with at least 4 heterozygous genotypes in 1000 Genomes are automatically presumed to be harmless polymorphisms ("P"). Variants not automatically determined to be disease-causing or polymorphic are predicted to be "disease-causing" ("D") or polymorphisms ("N") by the classifier. Probability scores close to 1 indicate high "security" of the prediction; probabilities close to 0 for an automatic prediction ("A" or "P") can indicate that the classifier predicted a different outcome. (Schwarz et al., 2010)
  • MutationAssessor uses sequence hom*ologs grouped into families and sub-families by combinatorial entropy formalism to compute a Functional Impact Score (FIS). It is intended for use in cancer studies, in which both gain of function and loss of function are important; the authors also identify a third category, "switch of function." A prediction of "high" or "medium" indicates that the variant probably has some functional impact, while "low" or "neutral" indicate that the variant is probably function-neutral. (Reva et al., 2011)
  • LRT (Likelihood Ratio Test) uses comparative genomics to identify variants that disrupt highly conserved amino acids. Variants are predicted to be deleterious ("D"), neutral ("N") or unknown ("U"). (Chun and Fay, 2009)
  • VEST (Variant Effect Scoring Tool) (available only for hg38/GRCh38) uses a classifier that was trained with ~45,000 disease mutations from HGMD and ~45,000 high frequency missense variants (putatively neutral) from the Exome Sequencing Project. (Carter et al., 2013)

In addition, dbNSFP providesInterPro protein domains where available(Hunter et al., 2012) and two measuresof conservation computed byGERP++(Davydov et al., 2010).

Transcript status

Some of the gene prediction tracks have additional annotations to indicate the amount or quality of supporting evidence for each transcript. When the track selected in the "Select Genes" section has such annotations, these can be enabled under "Transcript Status". The options depend on which gene prediction track is selected.

  • GENCODE tags: when GENCODE Genes are selected in the "Select Genes" section, any GENCODE tags associated with a transcript can be added to output.
  • RefSeq status: when RefSeq Genes are selected in the "Select Genes" section, the transcript's status can be included in output.
  • Canonical UCSC transcripts: when UCSC Genes (labeled GENCODE V22 in hg38/GRCh38) are selected in the "Select Genes" section, the flag "CANONICAL=YES" is added when the transcript has been chosen as "canonical" (see the "Related Data" section of the UCSC Genes track description).

Known variation

If the selected genome assembly has a SNPs track (derived fromdbSNP),when a variant has the same start and end coordinates as a variant indbSNP,the VAI includes the reference SNP (rs#) identifier in the output.Currently, the VAI does not compare alleles due to the frequency of strandanomalies in dbSNP.

Conservation

If the selected genome assembly has a Conservation track with phyloP scoresand/or phastCons scores and conserved elements,those can be included in the output.Both phastCons and phyloP are part of thePHAST package;see the Conservation track description in the Genome Browserfor more details.

Filters

The volume of unrestricted output can be quite large,making it difficult to identify variants of particular interest.Several filters can be applied to keep only those variantsthat have specific properties.

Functional role

By default, all variants are included in the output regardless ofpredicted functional effect. If you would like to keep only variantsthat have a particular type of effect, you can uncheck the checkboxesof other effect types.The detailed functional effect predictions are categorized as follows:

Known variation

(applicable only to assemblies that have "Common SNPs" and"Mult SNPs" tracks)By default, all variants appear in output regardless of overlap with knowndbSNP variants that map to multiple locations (a possible red flag),or that have a global minor allele frequency (MAF) of 1% or higher.Those categories of known variants can be used to exclude overlapping variantsfrom output by unchecking the corresponding checkbox.

Conservation

(applicable only to assemblies that have "Conservation" tracks)If desired, output can be restricted to only those variants that overlapconserved elements computed by phastCons.

Output format

Currently, the VAI produces output comparable to Ensembl'sVariant Effect Predictor (VEP), in either tab-separatedtext format or HTML.Columns are describedhere.When text output is selected, entering an output file name causes output tobe saved in a local file instead of appearing in the browser, optionallycompressed by gzip (compression reduces file size and network traffic,which results in faster downloads).When HTML is selected, output always appears in the browser window and theoutput file name is ignored.

Acknowledgments

Anyone familiar with Ensembl'sVariant Effect Predictor (VEP) will doubtless noticesimilarities in options and interface. In collaboration with our colleaguesat Ensembl, we have made an effort to limit the differences between the toolsby using Sequence Ontology terms to describe variants' functional effects andby creating a "VEP" output format.Any bugs in the VAI, however, are in the VAI only.

Variant Annotation Integrator (2024)

References

Top Articles
Roasted Radishes Recipe + Everything to Know About Radishes
Best Garlic Herb Butter Recipe - Evolving Table
Funny Roblox Id Codes 2023
Golden Abyss - Chapter 5 - Lunar_Angel
Www.paystubportal.com/7-11 Login
Joi Databas
DPhil Research - List of thesis titles
Shs Games 1V1 Lol
Evil Dead Rise Showtimes Near Massena Movieplex
Steamy Afternoon With Handsome Fernando
Which aspects are important in sales |#1 Prospection
Detroit Lions 50 50
18443168434
Newgate Honda
Zürich Stadion Letzigrund detailed interactive seating plan with seat & row numbers | Sitzplan Saalplan with Sitzplatz & Reihen Nummerierung
Grace Caroline Deepfake
978-0137606801
Nwi Arrests Lake County
Immortal Ink Waxahachie
Craigslist Free Stuff Santa Cruz
Mflwer
Spergo Net Worth 2022
Costco Gas Foster City
Obsidian Guard's Cutlass
Marvon McCray Update: Did He Pass Away Or Is He Still Alive?
Mccain Agportal
Amih Stocktwits
Fort Mccoy Fire Map
Uta Kinesiology Advising
Kcwi Tv Schedule
What Time Does Walmart Auto Center Open
Nesb Routing Number
Olivia Maeday
Random Bibleizer
10 Best Places to Go and Things to Know for a Trip to the Hickory M...
Black Lion Backpack And Glider Voucher
Gopher Carts Pensacola Beach
Duke University Transcript Request
Lincoln Financial Field, section 110, row 4, home of Philadelphia Eagles, Temple Owls, page 1
Jambus - Definition, Beispiele, Merkmale, Wirkung
Ark Unlock All Skins Command
Craigslist Red Wing Mn
D3 Boards
Jail View Sumter
Nancy Pazelt Obituary
Birmingham City Schools Clever Login
Thotsbook Com
Funkin' on the Heights
Vci Classified Paducah
Www Pig11 Net
Ty Glass Sentenced
Latest Posts
Article information

Author: Velia Krajcik

Last Updated:

Views: 5542

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.