FIGURE SUMMARY
Title

Cross-species analysis of enhancer logic using deep learning

Authors
Minnoye, L., Taskiran, I.I., Mauduit, D., Fazio, M., Van Aerschot, L., Hulselmans, G., Christiaens, V., Makhzami, S., Seltenhammer, M., Karras, P., Primot, A., Cadieu, E., van Rooijen, E., Marine, J.C., Egidy, G., Ghanem, G.E., Zon, L., Wouters, J., Aerts, S.
Source
Full text @ Genome Res.

Comparative epigenomics reveals conservation of two main melanoma states. (A) Evolutionary relationship between the six studied species, represented by a phylogenetic tree (NCBI taxonomy tree). ATAC-seq profiles of the 26 melanoma cell lines are shown for three regulatory regions. (B) ATAC-seq profiles of the human melanoma lines for the SOX10 locus. Lines are colored by the melanocytic (MEL, in blue) or mesenchymal-like (MES, in orange) melanoma state. (C) Total number of ATAC-seq regions observed across all samples of a species are colored based on whether they are not alignable, alignable, or conserved accessible in human. (D) PCA clustering based on the accessibility of the 29,619 alignable regions across all six species. (E) ATAC-seq profiles of MEL and MES lines of different species for an intronic MLANA enhancer and the upstream region of MMP3.

Conservation of binding motifs of master regulators of MEL and MES melanoma states. (A,B) Heatmap of differential ATAC-seq regions when comparing human MEL versus human MES lines (A) and the MEL dog line “Dog-OralMel-18249” versus the MES dog line “Dog-IrisMel-14205” (two biological replicates each) (B), colored by normalized ATAC-seq signal. Enriched TF binding motifs in the differential peaks were identified via HOMER (Heinz et al. 2010), and the first logo of enriched TF families is shown. The ratio of the percentage of target and background sequences with the motif is indicated between brackets, as well as the rank of the TF class within the HOMER output (#). (C) Schematic overview of cross-species motif analysis using the branch length score (BLS) as a measure for the evolutionary conservation of a motif hit across conserved accessible regions. The BLS was summed across a set of conserved accessible regions. (D,E) Histogram of the normalized summed BLS score for 20,003 motifs on 9732 conserved accessible regions across the mammalian MEL lines (D) and on 113 conserved accessible regions across MEL lines of all six species (E). The first hit of the top recurrent TF binding motifs within the top 4% conserved motifs is indicated as a cross and is accompanied by the logo of the motif.

DeepMEL classifies melanoma enhancers and predicts important TF binding motifs. (A) Cell-topic heatmap of cisTopic applied to 339,099 ATAC-seq regions across the 16 human melanoma lines, colored by normalized topic scores. (029*) MM029_R2. (B) Example regions of a MEL-specific (topic 4) region near MIA and MES-specific (topic 7) regions upstream of SERPINE1. (C) Schematic overview of DeepMEL. Twenty-four topics or sets of coaccessible regions were used as input for training of a multiclass multilabel neural network. (D,E) Receiver operating characteristic curve (D) and precision recall curve (E) for DeepMEL on training, test, and shuffled data of topic 4 and topic 7 regions. (F) Top enriched filters learned by DeepMEL to classify regions as MEL (topic 4) or MES (topic 7). Normalized filter importance is shown per filter. (G) Example of a MEL-predicted enhancer near IRF4. (First and second rows) DeepExplainer view of the forward and reverse strand, with the height of the nucleotides indicating the importance for prediction of the MEL enhancer. (Third row) In vitro effect of point mutations on enhancer activity as measured by MPRA (Kircher et al. 2019). Colors represent the nucleotide to which the wild-type nucleotide is mutated. (Fourth row) In silico effect of point mutations as predicted by DeepMEL. (H) Correlation between the in vitro mutational effects on the IRF4 enhancer and the in silico mutagenesis predictions. (I) Performance of variant effect prediction of DeepMEL using topics (black bar, model used in this paper) or using ATAC-seq signal (white bar), and several previously tested models on the IRF4 enhancer case (Kircher et al. 2019).

Human-trained deep learning model applied to cross-species ATAC-seq data. (A) Performance of DeepMEL and Cluster-Buster (cbust) in classifying MEL and MES differential peaks in human and dog. (B) Percentage of MEL- and MES-predicted ATAC-seq regions across all samples in our cohort and in human melanocytes. Samples are ordered according to the ratio of the number of MES/MEL-predicted regions. (C) Pearson's correlation of deep layer scores between MEL-predicted regions near orthologous MEL genes between human and another species (Human-Species) or between MEL-predicted regions near different MEL genes within one species (Species-Species). P-values of unpaired two-sample Wilcoxon tests are reported. (D) (I) Evolutionary distance between human and other species in branch length units. (II) ATAC-seq profiles of the ERBB3 locus in the six species. MEL-specific enhancers that were predicted by DeepMEL and that were also found (gray) or not found (green) via liftOver of the human MEL enhancer are highlighted. (III) DeepExplainer plots for the multiple-aligned MEL-predicted ERBB3 enhancers. Red and blue dots represent point and indel mutations, respectively.

Core Regulatory Complex of MEL melanoma enhancers. (A) Schematic overview of motif scoring method in which extended convolutional filter hits from DeepMEL are multiplied by DeepExplainer profiles to yield significant motif hits. (B,C) Heatmap (B) and binarized heatmap (C) of the number of significant SOX, TFAP2A, MITF, and RUNX-like motif hits on the 3885 MEL-predicted regions in the human cell line MM001. (D) Aggregation plot of normalized ChIP-seq signal of SOX10, MITF, and TFAP2A on the human enhancer clusters. (E,F) Venn diagram of regions clusters on the 3885 MEL-predicted regions in human (in MM001) (E) and the 4194 MEL-predicted regions in dog (in Dog-OralMel-18249) (F). Example MEL-predicted enhancers in human and dog are shown for two of the region clusters. The ATAC-seq signal of the regions is shown in gray.

Positional specificity of SOX10 and TFAP2A in MEL melanoma enhancers. (A,B, top) Example human (A) and dog (B) MEL-predicted enhancer containing significant SOX10 and TFAP2A motifs. The ATAC-seq signal is shown in gray. (A, middle; B, bottom) Imputed nucleosome start and middle point profiles. (A, bottom) For the human example region, ATAC-seq profiles of MM001 in control condition, after 72 h of SOX10 knockdown or TFAP2A knockdown are shown. (C) Schematic overview of the nucleosome structure explaining the colors used in D and E. (D,E) Nucleosome start point (D) and nucleosome middle point predictions (E) on MEL-predicted regions containing one SOX10 (left) or one TFAP2A motif (right) next to possible other motifs, where the regions are either centered on the ATAC-seq summit (gray) or on the SOX10 or TFAP2A motif (blue).

Predicting causal mutations of evolutionary changes in MEL enhancers. (A,B) Example region upstream of APPL2 that is accessible (A) and active (B) in the MEL dog line Dog-OralMel-18249 but not in human MEL lines. (C) DeepMEL prediction score of each of the 24 topics for the dog and human APPL2 enhancer. (D) Effect on topic 4 DeepMEL score on the dog sequence when in silico simulating each of the single detected point mutations between the dog and human APPL2 enhancer. (E) DeepExplainer plots of the middle 120 bp of the dog and human APPL2 enhancer. In the middle, the effect of each possible point mutation between the dog and human sequence on the MEL DeepMEL score was in silico calculated and is represented by colored dots depending on the nucleotide to which the original dog nucleotide was in silico mutated. Truly existing point mutations between the dog and human sequence are highlighted by color-coded vertical dashed lines. Four mutations that decrease the motif score of the SOX10, MITF, and TFAP2A motifs are highlighted by a gray box and are encircled. (F) Bar plot showing the mean effect on the log2 delta ATAC-seq signal of a non-human region compared to the human homolog depending on the number of SOX10 motif hits lost or gained. Only regions having no change in the number of significant TFAP2A, MITF, and RUNX motifs hits were used. The y-axis is normalized to the category with no changes in the number of significant SOX10 motif hits. The number of regions in each of the categories is mentioned (#). (G) Luciferase assay on six human or dog enhancers. Significant motif hits per enhancer are shown with colored crosses. For the luciferase assays: luciferase activity in MM001 is shown relative to Renilla signal and is log10 transformed. P-values were determined using Student's t-test, and the error bars represent the standard deviation over three biological replicates.

Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ Genome Res.