Figure 5—figure supplement 4.

Expert similarity across all datasets: mean IoU.

The heatmaps show the mean of M¯IoU for the image feature annotations of the indicated experts. The estimated ground truth (est. GT) was always calculated on all available expert annotations. The expert number refers to a unique human annotator (e.g. expert 1 is the same person across the datasets in A-D). The similarity scores were calculated on n = 5 images for A, B, C, and E and n = 9 images (test set) for D. The similarity between the same experts varies across the datasets (A–D), indicating that the heuristic bias of the annotators depends on the underlying data. However, the overall performance between the experts remains within a similar range.

Expression Data

Expression Detail
Antibody Labeling
Phenotype Data

Phenotype Detail
Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ Elife