PUBLICATION
XGBoost Framework with Feature Selection for the Prediction of RNA N5-methylcytosine sites
- Authors
- Abbas, Z., Rehman, M.U., Tayara, H., Zou, Q., Chong, K.T.
- ID
- ZDB-PUB-230605-34
- Date
- 2023
- Source
- Molecular therapy : the journal of the American Society of Gene Therapy 31(8): 2543-2551 (Journal)
- Registered Authors
- Keywords
- none
- MeSH Terms
-
- Animals
- Base Sequence
- Drosophila melanogaster*/genetics
- Mice
- RNA*/genetics
- PubMed
- 37271991 Full text @ Mol. Ther.
Citation
Abbas, Z., Rehman, M.U., Tayara, H., Zou, Q., Chong, K.T. (2023) XGBoost Framework with Feature Selection for the Prediction of RNA N5-methylcytosine sites. Molecular therapy : the journal of the American Society of Gene Therapy. 31(8):2543-2551.
Abstract
5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence and then used SHAP (SHapley Additive exPlanations) to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called OPTUNA to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets and compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state- of-the-art techniques.
Genes / Markers
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Orthology
Engineered Foreign Genes
Mapping