A workflow for studying enigmatic non-autonomous transposable elements across bacteria
RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes
Recommendation: posted 02 February 2023, validated 07 February 2023
Douglas, G. (2023) A workflow for studying enigmatic non-autonomous transposable elements across bacteria. Peer Community in Genomics, 100166. https://doi.org/10.24072/pci.genomics.100166
Repetitive extragenic palindromic sequences (REPs) are common repetitive elements in bacterial genomes (Gilson et al., 1984; Stern et al., 1984). In 2011, Bertels and Rainey identified that REPs are overrepresented in pairs of inverted repeats, which likely form hairpin structures, that they referred to as “REP doublets forming hairpins” (REPINs). Based on bioinformatics analyses, they argued that REPINs are likely selfish elements that evolved from REPs flanking particular transposes (Bertels and Rainey, 2011). These transposases, so-called REP-associated tyrosine transposases (RAYTs), were known to be highly associated with the REP content in a genome and to have characteristic upstream and downstream flanking REPs (Nunvar et al., 2010). The flanking REPs likely enable RAYT transposition, and their horizontal replication is physically linked to this process. In contrast, Bertels and Rainey hypothesized that REPINs are selfish elements that are highly replicated due to the similarity in arrangement to these RAYT-flanking REPs, but independent of RAYT transposition and generally with no impact on bacterial fitness (Bertels and Rainey, 2011).
This last point was especially contentious, as REPINs are highly conserved within species (Bertels and Rainey, 2023), which is unusual for non-beneficial bacterial DNA (Mira et al., 2001). Bertels and Rainey have since refined their argument to be that REPINs must provide benefits to host cells, but that there are nonetheless signatures of intragenomic conflict in genomes associated with these elements (Bertels and Rainey, 2023). These signatures reflect the divergent levels of selections driving REPIN distribution: selection at the level of each DNA element and selection on each individual bacterium. I found this observation particularly interesting as I and my colleague recently argued that these divergent levels of selection, and the interaction between them, is key to understanding bacterial pangenome diversity (Douglas and Shapiro, 2021). REPINs could be an excellent system for investigating these levels of selection across bacteria more generally.
The problem is that REPINs have not been widely characterized in bacterial genomes, partially because no bioinformatic workflow has been available for this purpose. To address this problem, Fortmann-Grote et al. (2023) developed RAREFAN, which is a web server for identifying RAYTs and associated REPINs in a set of input genomes. The authors showcase their tool by applying it to 49 Stenotrophomonas maltophilia genomes and providing examples of how to identify and assess RAYT-REPIN hits. The workflow requires several manual steps, but nonetheless represents a straightforward and standardized approach. Overall, this workflow should enable RAYTs and REPINs to be identified across diverse bacterial species, which will facilitate further investigation into the mechanisms driving their maintenance and spread.
Bertels F, Rainey PB (2023) Ancient Darwinian replicators nested within eubacterial genomes. BioEssays, 45, 2200085. https://doi.org/10.1002/bies.202200085
Bertels F, Rainey PB (2011) Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria. PLOS Genetics, 7, e1002132. https://doi.org/10.1371/journal.pgen.1002132
Douglas GM, Shapiro BJ (2021) Genic Selection Within Prokaryotic Pangenomes. Genome Biology and Evolution, 13, evab234. https://doi.org/10.1093/gbe/evab234
Fortmann-Grote C, Irmer J von, Bertels F (2023) RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. bioRxiv, 2022.05.22.493013, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.05.22.493013
Gilson E, Clément J m., Brutlag D, Hofnung M (1984) A family of dispersed repetitive extragenic palindromic DNA sequences in E. coli. The EMBO Journal, 3, 1417–1421. https://doi.org/10.1002/j.1460-2075.1984.tb01986.x
Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends in Genetics, 17, 589–596. https://doi.org/10.1016/S0168-9525(01)02447-7
Nunvar J, Huckova T, Licha I (2010) Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC Genomics, 11, 44. https://doi.org/10.1186/1471-2164-11-44
Stern MJ, Ames GF-L, Smith NH, Clare Robinson E, Higgins CF (1984) Repetitive extragenic palindromic sequences: A major component of the bacterial genome. Cell, 37, 1015–1026. https://doi.org/10.1016/0092-8674(84)90436-7
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.
Our work was funded by the Max Planck Society
Evaluation round #2
DOI or URL of the preprint: https://www.biorxiv.org/content/10.1101/2022.05.22.493013v3
Version of the preprint: 3
Author's Reply, 26 Jan 2023
Decision by Gavin Douglas, posted 29 Dec 2022, validated 03 Jan 2023
Hi Dr. Bertels and colleagues,
Both reviewers assessed your changes and agree that the manuscript is greatly improved. They have raised a few remaining points that warrant some further minor changes and clarifications.
In addition, please see my own minor comments below, which are primarily typo and phrasing fixes.
All the best,
The license information for the source of Figure 1 (Bertels, Rainey, 2022) should be indicated somewhere in the text. There are usually requirements for how to redistribute/modify items from another work. E.g., if this is under a creative commons license, then you should state what license version it corresponds to and give a link to that license.
The title should be changed to “RAREFAN: A webservice”… rather than “RAREFAN: a webservice…”
L67-72 I think many readers will be curious to know what the other RAYT families are associated with, if not REPINs. Since they are defined as “REP-associated” (in their name) I think this deserves at least a quick mention in a sentence or two.
Currently at least one parameter value in the Figure 2 mismatches with the Figure legend (distance between inverted sequences being 200bp vs 130bp). The authors should make sure that the values reported represent the current default values and old (or otherwise conflicting) values are not mismatched between the figure and legend, to avoid reader confusion.
Also in Figure 2 – I recommend that the figure legend be simplified, as many of the details are already provided in the methods are not really pertinent to interpreting the plot and the take-home messages. I will leave that to the authors’ discretion. However, I do strongly recommend that the references in this legend be removed, as these should be mentioned in the appropriate section of the methods instead (references are generally uncommon in figure legends).
I was unable to access results under run ID a2ijpkk6.
The authors should clarify the protocol on linking RAYTs to REPINs. Is it generally expected for at least one REP to be within 200bp of the corresponding RAYT? Since the RAYTs act in trans as proteins, there does not seem to be any reason why this necessarily be true, so I think a little additional explanation would be helpful.
Should use past tense when discussing specific results. So on L338 for instance, it should be “RAREFAN detected three populations when S. maltophilia Sm54 was selected as the reference strain”.
The authors should use “p-value”, “P-value”, or “p value”, but not “p-Value”, which is the current usage in the text.
L30 – “providing” was actually grammatically correct, and so the revised change to “provide” should be undone.
L48 – “REP sequences is” should be “REP sequences are”
L53 – I suggest “not mobile anymore” be reworded to “immobile” or “no longer mobile”
L58 – “associated to” should be “associated with”
L64 – I suggest “very special” be changed to “unique”
L77 – I think the year estimate should be clarified. Presumably some RAYT/REPIN groups may have been present in a lineage for less than a million years (or at least this is possible!). So I would re-word to say that “they have been evolving in single bacterial lineages for up to millions of, or perhaps even one billion, years.”
L102 – I think “Yet,” should be removed, or perhaps replaced with “Unfortunately,”
L104– “ins and outs” should be replaced with less colloquial language, such as “details” or “detailed features”
L105 – “the genome” should be “a genome”
L106 – “analyzed next” should be “then analyzed”
L106-107 – “If they are exclusively” should be something clearer like “If these sequences are exclusively”
L107-108 – I would put commas on each side of this sentence fragment: “and present in only one or two loci in the genome”
Figure 2 legend “the” should be re-added in front of “seed sequence”.
Implementation section of methods – python, java, flask, and shiny should all be capitalized
Regarding “Query RAYT” bullet point in implementation methods: above this is described as optional. The authors should clarify the procedure when this protein sequence is not provided, as is currently done for the Tree file option.
L180 – “(n-1)” should be “[n-1]”
L277 – “Especially” should be “This is especially true”
Figure 6 legend – here “group” is capitalized in some but not all cases. In this legend (and in the relevant section of the main text, where this also varies), the authors should consistently write “group” capitalized or lowercase in all instances.
L431-432 – “Genbank” should be “GenBank” and I think it would be clearer to say “creating a RAREFAN Galaxy workflow” rather than “integrating RAREFAN into workflows such as Galaxy”, as Galaxy is a means of making workflows available online for easy use, rather than referring to a specific workflow.
Reviewed by Sophie Abby, 23 Dec 2022
Reviewed by anonymous reviewer 1, 07 Dec 2022
Evaluation round #1
DOI or URL of the preprint: https://www.biorxiv.org/content/10.1101/2022.05.22.493013v2
Author's Reply, 15 Nov 2022
Decision by Gavin Douglas, posted 21 Jul 2022
Two reviewers have now finished their reports and they have highlighted numerous points that should be addressed. The main critique appears to be that further clarification is needed, both in terms of the motivation for annotating these elements in particular and regarding various technical details of your approach.The second reviewer also highlighted several practical issues (as well as discrepancies in the results themselves) that they ran into when trying to run the tool, which I found especially concerning.
I think all of the points that were raised are constructive and should help to improve the manuscript substantially. I look forward to seeing the next version!