Image
Figure Caption

Figure 1—figure supplement 2. Size distribution of three sequence data types present in the BigWenDB.

Boxplots show the size distribution of genomic ORFs (ORF), transcriptomic ORFs (TRS), and NCBI sequences (GI) in comparison to the average size of protein domains collected in the PFAM database V31.0 (March 2017; 16,712 entries). Data points outside 1.5 × the interquartile range are omitted for clarity. y-axis is in logarithmic scale. Corresponding sequence number and summary statistics are shown below each boxplot. The lower border (1st quartile) of the PFAM box is marked in red, together with the number of sequences per data type that surpass this size threshold.

Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ Elife