iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC

PLoS One. 2020 May 15;15(5):e0228479. doi: 10.1371/journal.pone.0228479. eCollection 2020.

Abstract

Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method "iterb-PPse" for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis. Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of "iterb-PPse" with the same name. The open software and source code of "iterb-PPse" are available at https://github.com/Sarahyouzi/iterb-PPse.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacillus subtilis
  • DNA, Bacterial / chemistry
  • DNA, Bacterial / genetics
  • Escherichia coli
  • RNA, Bacterial / chemistry
  • RNA, Bacterial / genetics
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Rho Factor / metabolism
  • Sequence Analysis, DNA / methods*
  • Software*
  • Terminator Regions, Genetic*
  • Transcription Termination, Genetic

Substances

  • DNA, Bacterial
  • RNA, Bacterial
  • RNA, Messenger
  • Rho Factor

Grants and funding

This work was supported by the National Natural Science Foundation of China (Grant NO. 61762026, 61462018) to YXF, the Guangxi Natural Science Foundation (Grant NO. 2017GXNSFAA198278, 2016GXNSFAA380043) to YXF, the Innovation Project of GUET Graduate Education (Grant NO. 2018YJCX47, 2019YCXS056) to YXF, the Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics (Grant NO. GIIP201502) to YXF and Guangxi Key Laboratory of Trusted Software (Grant NO. kx201403) to YXF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.