|
BOM accurately classifies context-specific CREs in different datasets. a ROC curves (left) and precision-recall curves (right) illustrating the performance of binary BOM models in predicting cell line-specific CREs across six human cell lines (Gm12878, H1-hESC, HeLa-S3, HepG2, Huvec, K562) (n = 66863 CREs). The cell line-specific CREs were defined via a 25-state ChromHMM model38. b Performance of binary BOM models trained to distinguish cell-type-specific CREs for 22 human blood and bone marrow cell types91. Models were trained to distinguish cell-type-specific CREs from a background of CREs specific to other cell types. F1 scores were computed for each binary model (rows) and dataset (columns) (n = 5124 CREs). c ROC curves (left) and precision‒recall curves (right) showing the prediction of tissue-specific CREs defined for 11 adult zebrafish tissues via bulk ATAC-seq data99. A binary BOM model was trained for every tissue (n = 59553). The mean AUC is shown in each panel. d Correlation between mean developmental (Dev) and housekeeping (Hk) enhancer activity, as measured by MPRA in fruit fly S2 cells, and predicted activity (left and middle panels) (n = 1258, 1258; Dev and Hk enhancers)30,39. The log2-fold change in Dev versus Hk enhancers for the measured activities on the MPRA and the predicted values (right panel). Enhancers are colored based on the observed class. e ROC curves (left) and PR curves (right) for the classification of cell-type-specific CREs of four A. thaliana root cell types from ref. 40 in a multiclass BOM model. Mean area under the curve values is shown in each panel. f The 20 most predictive motifs in a binary model classifying peaks more accessible in pre-leukemic or blast cells is shownn. Each dot is a single CRE. Y-axis label is a TF motif from GimmeMotifs. Color code represents the normalized motif count: (counts − min(counts, na.rm = TRUE))/(max(counts, na.rm = TRUE) − min(counts, na.rm = TRUE))). A positive SHAP score indicates importance in AML, while a negative value indicates importance for pre-leukemic. Source Data are provided as a Source Data file.
|