FIGURE SUMMARY
Title

Gene Characterization of Nocturnin Paralogues in Goldfish: Full Coding Sequences, Structure, Phylogeny and Tissue Expression

Authors
Madera, D., Alonso-Gómez, A., Delgado, M.J., Valenciano, A.I., Alonso-Gómez, Á.L.
Source
Full text @ Int. J. Mol. Sci.

Nocturnin gene structure in Osteichthyes. Full-genome duplications are indicated as 3R (Teleostei specific) and 4Rc (Cyprininae specific). Proposed model for the exon–intron structure of the gene encoding nocturnin in different Osteichthyes clades, and its transcription into mRNA (dashed lines). The I, II and III indicate the alternative splicing variants. # indicates the alternative transcription start sites. Exons are represented by boxes and introns by lines. Angled blue lines indicate alternative splicing of the first exon (exon 1a, exon 1b or exon 1c) in mature mRNA. The length (bases pair, bp) of exons (inside the boxes) and introns (above lines) is indicated for the Lepisosteus oculatus gene. Lost exons during evolution of Ostariophysi and Cyprininae are represented by red crosses. Question marks indicate unconfirmed loss of variant III in the noc-a paralogues of cyprinines. Light pink and cyan backgrounds on Cyprininae nocturnin names indicate matrilineal and patrilineal origin, respectively.

Exon-intron structure and transcription pattern of nocturnin gene paralogues in goldfish obtained by mRNA sequencing. I, II, and III indicate the alternative splicing variants. Exon 1a, exon 1b, and exon 1c are the alternatives of the first exon in mature mRNA. Angled blue lines indicate an alternative splicing mechanism. Coding exons are indicated by colored boxes and introns by lines. Grey boxes indicate untranslated regions (UTR and exon 1b). The length (bases pair, bp) of exons is indicated inside the boxes. The length of coding sequences (cds) is also indicated.

Mutations accumulated in the noc-ba paralogue of Carassius gibelio (Gene ID: 127988217) and Carassius auratus (Gene ID: 113105808) compared to Cyprinus carpio (accession no. XM_019094354.2). Illustration represents the exon–intron structure of the splicing variant II of noc-ba. Exons are indicated by boxes and introns by lines. The symbols above (C. gibelio) and below (C. auratus) indicate the mutations compared to C. carpio sequence. The blue and purple rays represent in-frame and frameshift deletions, respectively, and the base pairs suppressed, Δ(bp). The black arrow I(1) indicates an insertion of 1 bp. The red stars represent premature stop codons in transcribed sequence.

Nucleotide and deduced amino acid sequences of goldfish noc-aa. Sequences are accessible through GenBank. Splice Variant I is composed of exon 1a in blue (coding sequence with cyan background), exon 2 (green background) and exon 3 (yellow background) (allele 1 accession no. OR651354). Splice Variant II is composed of non-coding exon 1b, exon 2, and exon 3 (accession no. OR651356). Red arrow indicates the insertion point of exon 1a or exon 1b on exon 2. Bold font indicates stop codons upstream of the initial methionine codon. Coding region extends from the first methionine residue (blue box for variant I, green box for variant II) to the stop codon (* and red box). Backgrounds colored cyan, green, and yellow indicate the coding part of exon 1a, 2, and 3, respectively. Red letters indicate differences in nucleotides and amino acids between the two allelic sequences (allele 2, accession no. OR651355 and OR651357).

Nucleotide and deduced amino acid sequences of goldfish noc-ab. Both sequences are accessible through GenBank. Splice Variant I is composed of exon 1a in blue (coding sequence with cyan background), exon 2 (green background), and exon 3 (yellow background) (accession no. OR651297). Splice Variant II is composed of non-coding exon 1b, exon 2, and exon 3 (accession no. OR651298). Red arrow indicates the insertion point of exon 1a or exon 1b on exon 2. Bold font indicates stop codons upstream of initial methionine codon. Coding region extends from the first methionine residue (blue box for variant I, green box for variant II) to the stop codon (* and red box). Backgrounds cyan, green and yellow indicate the coding part of exons 1a, 2, and 3, respectively.

Nucleotide and deduced amino acid sequences of goldfish noc-bb. Both sequences are accessible through GenBank. Splice Variant III is composed of exon 1c in purple (coding sequence with purple background), exon 2 (green background), and exon 3 (yellow background) (accession no. OR651299). Splice Variant II is composed of non-coding exon 1b, exon 2, and exon 3 (accession no. OR651300). Red arrow indicates the insertion point of exon 1b or exon 1c on exon 2. Bold font indicates stop codons upstream of initial methionine codon. Coding region extends from the first methionine residue (purple box for variant III, green box for variant II) to the stop codon (* and red box). Backgrounds colored purple, green and yellow indicate exons 1c, 2, and 3, respectively.

Alignment of the deduced amino acid sequences of splicing variant II of goldfish NOC-AA (Cau NOC-AA-II: WNX29031), NOC-AB (Cau NOC-AB-II: WNX29026), and NOC-BB (Cau NOC-BB-II: WNX29028) with NOC sequences from Danio rerio (Dre NOC-A-II: XP_005169676.1) and (Dre NOC-B-II: XP_021331689.1), Xenopus tropicalis (Xtr: KAE8630336.1), and Homo sapiens (Hsa: NP_036250.2). Multiple sequence alignment was conducted using Clustal v.X2. Bottom symbols (. : *), indicating a significant conservation from minor to major. Horizontal hyphens indicate gaps introduced to optimize the alignment. The vertical hyphen line (black arrow) marks the transition between exon 2 and 3. The secondary structure was predicted using JPred4. Structural elements are indicated as follows: yellow boxes (α-helix) and blue boxes (β-sheet). The names of α and β follow the nomenclature of Abshire and coworkers [27]. Green boxes pointed to by a green arrow indicate the amino acids conserved in the active site. Pink boxes mark the position of the heptad leucine repeat. Red rectangles indicate putative myristoylation sites, and the orange rectangle indicates a Ser/Thr-rich region. Black rectangles indicate the non-conserved domain of goldfish NOC paralogues compared to human and frog NOC.

Tertiary predicted 3D structures of splicing variant II of goldfish; nocturnin: NOC-AA (A), NOC-AB (B), and NOC-BB (C). In blue are the conserved segments. The non-conserved domains in human and goldfish are represented in orange (the arrows indicate the amino acids’ position in Figure 7).

Synteny analysis of noc-a paralogues in Danio rerio (Dre), Sinocyclocheilus graham (Sgr), Cyprinus carpio (Cca), Carassius gibelio (Cgi), and Carassius auratus (Cau). Black bolded names indicate nocturnin paralogues in the center of synteny. Chromosomes are represented with a white background in Danio rerio (as the reference species) and a cyan or pink background in the other species, depending on the patrilineal or matrilineal subgenome, respectively, of the ancestral duplication of Cyprininae (4Rc). Chromosome coding corresponds to that of the genome project (Supplementary Table S2), and the (+) or (−) symbols represent the sense or antisense orientation of the chromosome, respectively, in the projects analyzed. Genes are represented with pentagons whose sharp vertex indicates the transcription sense. Same colors indicate orthologous genes. Black pentagons represent unknown genes or genes without orthologous in any chromosome analyzed. Mutations are inferred comparing the orthologous chromosomes: pseudogene (Ψ), deletion (#), tandem duplication (green dashed boxes), and inversion (red dashed boxes). Gene abbreviations are indicated in the Supplementary Table S3. Gene names in bold blue indicate diagnostic markers of patrilineal chromosomes.

Synteny analysis of noc-b paralogues in Danio rerio (Dre), Sinocyclocheilus graham (Sgr), Cyprinus carpio (Cca), Carassius gibelio (Cgi), and Carassius auratus (Cau). Black bolded names indicate nocturnin paralogues in the center of synteny. Chromosomes are represented with a white background in Danio rerio (as reference species) and with a cyan or pink background in the other species depending on the patrilineal or matrilineal subgenome, respectively, of the ancestral duplication of Cyprininae (4Rc). Chromosome coding corresponds to that of the genome project (supplementary Table S2), and the (+) or (−) symbols represent the sense or antisense orientation of the chromosome, respectively, in the projects analyzed. Genes are represented with pentagons whose sharp vertex indicates the transcription sense. Same colors indicate orthologous genes. Black pentagons represent unknown genes. Pink pentagons indicate zinc finger genes. Mutations are inferred comparing the orthologous chromosomes: pseudogene (Ψ), deletion (#), tandem duplication (green boxes), inversion (red boxes), insertion (black circle), and provisional indel (*). Gene abbreviations are indicated in the Supplementary Table S4. Gene names in bold blue (below) or bold brown (above) indicate diagnostic markers of patrilineal or matrilineal chromosomes, respectively.

(A,B) Synteny index and synteny conservation rate in Carassius auratus compared to Danio rerio (white), Cyprinus carpio (light gray) and Carassius gibelio (dark gray). Numbers in parentheses indicate genes upstream and downstream of the noc paralogue analyzed. Blue and pink backgrounds indicate patrilineal or matrilineal origin of chromosomes, respectively. The lower horizontal line indicates the threshold (1 for synteny index, 50% for synteny conservation rate), where the conserved loci positions are more frequent than the non-conserved. The line above in the graphs in B indicates the maximum synteny conservation rate (%) for each paralogue. (C) Assumed phylogeny of species analyzed in the synteny study.

Phylogenetic tree showing the 4Rc relationships among NOC sequences. The evolutionary model used was the Jones–Taylor–Thornton, Gamma-distributed (JTT + G). The tree was inferred by the maximun likelihood method (ML). The numbers in the nodes refer to bootstrap values of a total of 1000 replicates. The scale bar indicates the average number of substitutions per position. The binomial name of the species is given on the right side of the tree. Letters A and B indicate NOC isoforms in teleosts. 3R and 4Rc indicate the proposed whole-genome duplication events in Teleostei and Cyprininae, respectively. Blue and pink boxes indicate the patrilineal of matrilineal clades of Cyprininae species, respectively. Ψ, indicates pseudogene. Species names and GenBank accession numbers of the sequences are indicated in Supplementary Figure S3.

Expression of noc-aa, noc-ab and noc-bb in peripheral (A,C,E) and neural tissues of goldfish. Data are expressed as mean + S.E.M. (n = 6) relative to the hepatopancreas (A,C,E) and to the diencephalon (B,D,F). Different letters indicate statistical differences among tissues (ANOVA-SNK; p < 0.05). Head kidney (HK), gill (GILL), heart (HEA), esophagus (ESO), intestinal bulb (IBU), anterior intestine (AI), middle intestine (MI), posterior intestine (PI), spleen (SPL), hepatopancreas (HEP), adipose tissue (ADT), caudal kidney (KID), gonad (GON), skin (SKIN), muscle (MUS), telencephalon (TEL), diencephalon (DI), optic tectum (OT), hypothalamus (HT), pituitary (PIT), cerebellum (CER), vagal lobe (VL), brainstem (BRS), retina (RET).

Expression of noc-a (A) and noc-b (B) in peripheral tissues and brain in zebrafish. Data are expressed as mean + S.E.M. (n = 5) relative to the brain. Different letters indicate statistical differences among tissues (ANOVA-SNK; p < 0.05). Head kidney (HK), gill (GILL), esophagus (ESO), intestinal bulb (IBU), anterior intestine (AI), posterior intestine (PI), spleen (SPL), liver (LIV), pancreas (PAN), adipose tissue (ADT), caudal kidney (KID), gonad (GON), skin (SKIN), muscle (MUS), brain (BRAIN), retina (RET).

Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ Int. J. Mol. Sci.