PUBLICATION

Motif-based models accurately predict cell type-specific distal regulatory elements

Authors
Cornejo-Páramo, P., Zhang, X., Louis, L., Li, Z., Yang, Y., Wong, E.S.
ID
ZDB-PUB-251125-8
Date
2025
Source
Nature communications   16: 1037010370 (Journal)
Registered Authors
Wong, Emily
Keywords
none
MeSH Terms
  • Animals
  • Arabidopsis/genetics
  • Computational Biology*/methods
  • Enhancer Elements, Genetic*/genetics
  • Gene Expression Regulation
  • Humans
  • Mice
  • Nucleotide Motifs*/genetics
  • Regulatory Sequences, Nucleic Acid*
  • Transcription Factors/genetics
  • Transcription Factors/metabolism
  • Zebrafish/genetics
PubMed
41285795 Full text @ Nat. Commun.
Abstract
Deciphering how DNA sequence specifies cell-type-specific regulatory activity is a central challenge in gene regulation. We present Bag-of-Motifs (BOM), a computational framework that represents distal cis-regulatory elements as unordered counts of transcription factor (TF) motifs. This minimalist representation, combined with gradient-boosted trees, enables the accurate prediction of cell-type-specific enhancers across mouse, human, zebrafish, and Arabidopsis datasets. Despite its simplicity, BOM outperforms more complex deep-learning models while using fewer parameters. We validate BOM's predictions experimentally by constructing synthetic enhancers from the most predictive motifs, demonstrating that these motif sets drive cell-type-specific expression. By providing direct interpretability and broad applicability, BOM reveals a highly predictive sequence code at distal regulatory regions and offers a scalable framework for dissecting cis-regulatory grammar across diverse species and conditions.
Genes / Markers
Figures
Show all Figures
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Antibodies
Orthology
Engineered Foreign Genes
Mapping