- known mRNA - 100% Identical to RefSeq NP or UniProtKB entry.
- miRBASE miRNA - Sequence assigned to mature microRNA through data exchange with miRBASE.
- published - Sequence assigned to the transcript from a published reference indicated by the attribution.
- Withdrawn by Sanger - Transcript has been withdrawn by Vega annotators after identification of assembly or annotation errors.
- retained intron - Alternatively spliced transcript believed to contain intronic sequence relative to other, coding, variants.
- variant - Refers to transcripts identified as allelic or sequence variants.
- nonsense mediated decay - If the coding sequence (following the appropriate reference) of a transcript finishes >50bp from a downstream splice site, then it is tagged as NMD. If the variant does not cover the full reference coding sequence then it is annotated as NMD. If NMD is unavoidable (no matter what the exon structure of the missing portion is) the transcript will be subject to NMD.
- unknown - Used to tag transcripts that do not have an annotation status available.
- novel mRNA - Shares >60% length with known coding sequence from RefSeq or UniProtKB or has cross-species/family support or domain evidence.
- putative mRNA - Shares <60% length with known coding sequence from RefSeq or UniProtKB, or has an alternative first or last coding exon.
- protein coding in progress mRNA - These transcript classes, restricted to Zebrafish, represent transcripts which were originally classed as 'coding', but whose ORF has been lost during the generation of the current tiling path. They are currently being re-annotated.
- ambiguous ORF ncRNA - Transcript believed to be protein coding, but with more than one possible open reading frame.
- putative ncRNA - Transcript supported by limited spliced EST evidence data.
- predicted ncRNA - Transcript partly based on ab initio predictions.
- haplotypic - Refers to transcripts derived from a gene represented by different haplotypes.
- fragmented - Refers to transcripts annotated from Sanger that span one or more clones, but also refer to the same gene.
- to be experimentally confirmed - This is used for non-spliced EST clusters that have polyA features. This category has been specifically created for the ENCODE project to highlight regions that could indicate the presence of novel protein coding genes that require experimental validation, either by 5' RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.
- artifact - Used to tag mistakes in the public databases (Ensembl/UniProtKB/Trembl). Usually these arise from high-throughput cDNA sequencing projects, which submit automatic annotation sometimes resulting in erroneous coding sequences that are, for example, 3' UTRs.