- Title
-
High resolution of full-length RNA sequencing deciphers massive transcriptome complexity during zebrafish embryogenesis
- Authors
- Bo, J., Fang, W., Wang, J., He, S., Yang, L.
- Source
- Full text @ BMC Biol.
Identification of novel transcripts using PacBio long-read sequencing in 21 stages of zebrafish embryos. A Schematic of the transcriptome reconstruction pipeline based on long-read sequencing. Stage-specific transcriptomes were reconstructed from 21 embryonic stages: unfertilized, 1-cell, 2-cell, 128-cell, 1 k-cell, oblong, dome, 50% epiboly, shield, 75% epiboly, 1–4 somites, 14–19 somites, 20–25 somites, prim-5, prim-15, prim-25, long pec, protruding mouth, day 4, day 5, and day 6. Long reads were mapped to the reference GRCz11 genome using GMAP and annotated against the reference transcriptome using Cuffcompare. The long-read transcripts were also compared and validated to the short RNA-seq data. B The ratios of high-quality transcripts of zebrafish embryo development to the reference GRCz11 were greater than 99%. C Annotation of identified long-read transcripts in the 21 stages. Transcripts from each stage were annotated into five categories, and the percentages of the five categories are shown. The red lines show the total number of transcripts in each stage. D Transcripts from 21 stages were merged using Cuffmerge, and the merged transcripts were annotated. The numbers and percentages show the transcripts of the five categories of transcripts after being merged. E According to the annotation of the reference GRCz11 and coding potential, annotated transcripts and novel transcripts were classified into protein-coding and lncRNA, respectively. F High-quality transcripts of 21 stages were identified in short-read RNA-seq data and their expression levels were calculated |
Characterization of novel transcripts from long-read data in zebrafish embryos. A The heatmap represents the H3K4me3 signal distribution in the promoter region (± 3 kb) of the transcriptional start sites (TSSs) of annotated transcripts and the classical TSSs of novel isoform transcripts. The columns correspond to unfertilized, 2-cell, 128-cell, and 1 k-cell stages, respectively. B The heatmap represents the H3K4me3 signal distribution in the promoter region (± 3 kb) of the TSSs except for the classic TSSs in the novel isoform transcripts and the TSSs of novel gene transcripts. C The CAGE data of zebrafish at stage 1-cell, oblong, shield, 14–19 somites, prim-5, and prim-25 were compared with annotated TSSs, classical TSSs, novel TSSs belonging to novel isoforms, and novel gene TSSs. The blue bars represent the TSSs supported by CAGE data. Red bars indicate TSSs that CAGE data does not support. D An example of a novel isoform, located at positions 2,248,323–2,311,225 on chromosome 13. IGV view of the next-generation RNA-Seq alignment density, H3K4me3 peak, CAGE-seq capping site and reference genome annotation in novel isoform regions. The novel isoform was demonstrated by next-generation RNA-seq density, the H3K4me3 peak, and CAGE-seq capping site. E An example of a novel gene located at positions 989,552–1,015,458 on chromosome 1. IGV view of the RNA-Seq alignment density, the H3K4me3 peak, CAGE-seq capping site, and reference genome annotation in the novel gene region. The novel gene was also demonstrated by next-generation RNA-seq density, H3K4me3 peak, and CAGE-seq capping site. F Homology and domain annotation of novel coding transcripts. The significance threshold e-value was set to 1e − 5, and the blue points show that both BlastP and pfam were significant. The purple points indicate that only pfam was significant; the green points mean that only BlastP was significant, and none of the gray points were significant. G The scatter plot shows a conservation analysis of Novel lncRNA transcripts. The X-axis represents the maximum phastCons score for the 200-bp sliding window, and the Y-axis represents the fragment phyloP score for conserved bases. The significance threshold was set to the 95 th percentile of the randomized control-region scores. Blue plots mean both were significant; purple plots mean only phastCons was significant; green plots mean only phyloP was significant, and gray plots mean neither was significant |
Clustering of differentially expressed transcripts in adjacent stages. A There were 15 clusters with clear single-time effects for the transcripts that were significantly differentially expressed in two adjacent periods. The heatmap shows the changes of normalized FPKM values of 15 clusters, among which cluster 25 and cluster 7 were highly expressed transcripts in the oblong stage, cluster 28, cluster 10, and cluster 16 were highly expressed transcripts in the maternal period. Cluster 8, cluster 11, and cluster 24 indicate zygote activation to begin the high level of transcription. Cluster 1, cluster 13, and cluster 26 after the pharyngula stage began showed high levels of transcription. Cluster 9, cluster 6, cluster 14, and cluster 27 were highly expressed transcripts at the beginning of segment development. The line plot shows the trend line of the average FPKM value, and the oblong period is the critical period of the transition from maternal to zygote expression. B The enrichment threshold of GO level 2 items for 15 clusters were set to a P-value less than 0.05, but cluster 26 had no level 2 items with P-values less than 0.05. C The enrichment categories of GO level 4 in cluster 26 show that this period focuses on signal transmission and material transport between organs |
Classification and dynamic variation of AS events. A SUPPA2 recognizes seven AS types. B The bar chart shows the percentages of seven AS events in each period, and the line chart shows the total number of AS events in each period. C The bar chart shows the percentages of seven AS genes in each period, and the line chart shows the total number of AS genes in each period. D The change in AS transcripts between two adjacent periods. Positive values on the vertical axis represent the number of AS transcripts gained between adjacent developmental stages, and negative values represent the number of transcripts lost. The numbers on the horizontal axis indicate changes between different developmental stages |
Alternative splicing dynamics and transcript isoforms of enpep during zebrafish development. A River plot showing the dynamics of differentially alternatively spliced transcripts across developmental stages. B Transcript isoforms of the enpep gene expressed during different developmental stages. In early stages (unfertilized, 128-cell, 1 k-cell, oblong, and dome), enpep expresses shorter transcripts with fewer exons, which are novel, unannotated isoform. In later stages (14–19 somites, 20–25 somites, prim-5, prim-15, long pec, day 4, and day 6), enpep expresses longer transcripts with more exons |