PUBLICATION
OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species
- Authors
- Chao, K.H., Mao, A., Liu, A., Salzberg, S.L., Pertea, M.
- ID
- ZDB-PUB-251031-3
- Date
- 2025
- Source
- eLIFE 14: (Other)
- Registered Authors
- Keywords
- A. thaliana, PyTorch, Splice site prediction, SpliceAI, Transfer learning, arabidopsis thaliana, computational biology, deep learning, honeybee, human, mouse, splice junctions, systems biology, zebrafish
- MeSH Terms
-
- Animals
- Computational Biology*/methods
- Deep Learning*
- Humans
- RNA Splicing*
- Sequence Analysis, DNA*/methods
- Software*
- PubMed
- 41165728 Full text @ Elife
Citation
Chao, K.H., Mao, A., Liu, A., Salzberg, S.L., Pertea, M. (2025) OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species. eLIFE. 14:.
Abstract
The SpliceAI deep learning system is currently one of the most accurate methods for identifying splicing signals directly from DNA sequences. However, its utility is limited by its reliance on older software frameworks and human-centric training data. Here, we introduce OpenSpliceAI, a trainable, open-source version of SpliceAI implemented in PyTorch to address these challenges. OpenSpliceAI supports both training from scratch and transfer learning, enabling seamless retraining on species-specific datasets and mitigating human-centric biases. Our experiments show that it achieves faster processing speeds and lower memory usage than the original SpliceAI code, allowing large-scale analyses of extensive genomic regions on a single GPU. Additionally, OpenSpliceAI's flexible architecture makes for easier integration with established machine learning ecosystems, simplifying the development of custom splicing models for different species and applications. We demonstrate that OpenSpliceAI's output is highly concordant with SpliceAI. In silico mutagenesis analyses confirm that both models rely on similar sequence features, and calibration experiments demonstrate similar score probability estimates.
Genes / Markers
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Orthology
Engineered Foreign Genes
Mapping