PUBLICATION

Repeatability in protein sequences

Authors
Kamel, M., Mier, P., Tari, A., Andrade-Navarro, M.A.
ID
ZDB-PUB-190814-15
Date
2019
Source
Journal of structural biology   208(2): 86-91 (Journal)
Registered Authors
Keywords
amino acid short tandem repeats, computational detection of sequence repeats, homorepeats, low complexity regions, repeatability, web tool
MeSH Terms
  • Algorithms
  • Amino Acid Sequence
  • Databases, Protein
  • Evolution, Molecular
  • Humans
  • Proteins/chemistry*
  • Repetitive Sequences, Amino Acid
  • Sequence Alignment
  • Sequence Analysis, Protein/methods*
PubMed
31408700 Full text @ J. Struct. Biol.
Abstract
Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.
Genes / Markers
Figures
Expression
Phenotype
Mutations / Transgenics
Human Disease / Model
Sequence Targeting Reagents
Fish
Antibodies
Orthology
Engineered Foreign Genes
Mapping