PUBLICATION

XGBoost Framework with Feature Selection for the Prediction of RNA N5-methylcytosine sites

Authors
Abbas, Z., Rehman, M.U., Tayara, H., Zou, Q., Chong, K.T.
ID
ZDB-PUB-230605-34
Date
2023
Source
Molecular therapy : the journal of the American Society of Gene Therapy   31(8): 2543-2551 (Journal)
Registered Authors
Keywords
none
MeSH Terms
  • Animals
  • Base Sequence
  • Drosophila melanogaster*/genetics
  • Mice
  • RNA*/genetics
PubMed
37271991 Full text @ Mol. Ther.
Abstract
5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence and then used SHAP (SHapley Additive exPlanations) to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called OPTUNA to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets and compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state- of-the-art techniques.
Genes / Markers
Figures
Show all Figures
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Antibodies
Orthology
Engineered Foreign Genes
Mapping