PUBLICATION
Motif-based models accurately predict cell type-specific distal regulatory elements
- Authors
- Cornejo-Páramo, P., Zhang, X., Louis, L., Li, Z., Yang, Y., Wong, E.S.
- ID
- ZDB-PUB-251125-8
- Date
- 2025
- Source
- Nature communications 16: 1037010370 (Journal)
- Registered Authors
- Wong, Emily
- Keywords
- none
- MeSH Terms
-
- Animals
- Arabidopsis/genetics
- Computational Biology*/methods
- Enhancer Elements, Genetic*/genetics
- Gene Expression Regulation
- Humans
- Mice
- Nucleotide Motifs*/genetics
- Regulatory Sequences, Nucleic Acid*
- Transcription Factors/genetics
- Transcription Factors/metabolism
- Zebrafish/genetics
- PubMed
- 41285795 Full text @ Nat. Commun.
Citation
Cornejo-Páramo, P., Zhang, X., Louis, L., Li, Z., Yang, Y., Wong, E.S. (2025) Motif-based models accurately predict cell type-specific distal regulatory elements. Nature communications. 16:1037010370.
Abstract
Deciphering how DNA sequence specifies cell-type-specific regulatory activity is a central challenge in gene regulation. We present Bag-of-Motifs (BOM), a computational framework that represents distal cis-regulatory elements as unordered counts of transcription factor (TF) motifs. This minimalist representation, combined with gradient-boosted trees, enables the accurate prediction of cell-type-specific enhancers across mouse, human, zebrafish, and Arabidopsis datasets. Despite its simplicity, BOM outperforms more complex deep-learning models while using fewer parameters. We validate BOM's predictions experimentally by constructing synthetic enhancers from the most predictive motifs, demonstrating that these motif sets drive cell-type-specific expression. By providing direct interpretability and broad applicability, BOM reveals a highly predictive sequence code at distal regulatory regions and offers a scalable framework for dissecting cis-regulatory grammar across diverse species and conditions.
Genes / Markers
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Orthology
Engineered Foreign Genes
Mapping