Submit a preprint

Latest recommendationsrsstwitter

IdTitleAuthorsAbstractPicture▲Thematic fieldsRecommenderReviewersSubmission date
01 May 2024
article picture

Evolution of ion channels in cetaceans: A natural experiment in the tree of life

Positive selection acted upon cetacean ion channels during the aquatic transition

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

The transition of cetaceans (whales, dolphins, and porpoises) from terrestrial to aquatic lifestyles is a striking example of natural selection driving major phenotypic changes (Figure 1). For instance, cetaceans have evolved the ability to withstand high pressure and to store oxygen for long periods, among other adaptations (Das et al. 2023). Many phenotypic changes, such as shifts in organ structure, have been well-characterized through fossils (Thewissen et al. 2009). Although such phenotypic transitions are now well understood, we have only a partial understanding of the underlying genetic mechanisms. Scanning for signatures of adaptation in genes related to phenotypes of interest is one approach to better understand these mechanisms. This was the focus of Uribe and colleagues’ (2024) work, who tested for such signatures across cetacean protein-coding genes.


Cetacean fossils

Figure 1: The skeletons of Ambulocetus (an early whale; top) and Pakicetus (the earliest known cetacean, which lived about 50 million years ago; bottom). Copyright: J. G. M. Thewissen. Displayed here with permission from the copyright holder.


The authors were specifically interested in investigating the evolution of ion channels, as these proteins play fundamental roles in physiological processes. An important aspect of their work was to develop a bioinformatic pipeline to identify orthologous ion channel genes across a set of genomes. After applying their bioinformatic workflow to 18 mammalian species (including nine cetaceans), they conducted tests to find out whether these genes showed signatures of positive selection in the cetacean lineage. For many ion channel genes, elevated ratios of non-synonymous to synonymous substitution rates were detected (for at least a subset of sites, and not necessarily the entire coding region of the genes). The genes concerned were enriched for several functions, including heart and nervous system-related phenotypes.

One top gene hit among the putatively selected genes was SCN5A, which encodes a sodium channel expressed in the heart. Interestingly, the authors noted a specific amino acid replacement, which is associated with sensitivity to the toxin tetrodotoxin in other lineages. This substitution appears to have occurred in the common ancestor of toothed whales, and then was reversed in the ancestor of bottlenose dolphins. The authors describe known bottlenose dolphin interactions with toxin-producing pufferfish that could result in high tetrodotoxin exposure, and thus perhaps higher selection for tetrodotoxin resistance. Although this observation is intriguing, the authors emphasize it requires experimental confirmation.

The authors also recapitulated the previously described observation (Yim et al. 2014; Huelsmann et al. 2019) that cetaceans have fewer protein-coding genes compared to terrestrial mammals, on average. This signal has previously been hypothesized to partially reflect adaptive gene loss. For example, specific gene loss events likely decreased the risk of developing blood clots while diving (Huelsmann et al. 2019). Uribe and colleagues also considered overall gene turnover rate, which encompasses gene copy number variation across lineages, and found the cetacean gene turnover rate to be three times higher than that of terrestrial mammals. Finally, they found that cetaceans have a higher proportion of ion channel genes (relative to all protein-coding genes in a genome) compared to terrestrial mammals. 

Similar investigations of the relative non-synonymous to synonymous substitution rates across cetacean and terrestrial mammal orthologs have been conducted previously, but these have primarily focused on dolphins as the sole cetacean representative (McGowen et al. 2012; Nery et al. 2013; Sun et al. 2013). These projects have also been conducted across a large proportion of orthologous genes, rather than a subset with a particular function. Performing proteome-wide investigations can be valuable in that they summarize the genome-wide signal, but can suffer from a high multiple testing burden. More generally, investigating a more targeted question, such as the extent of positive selection acting on ion channels in this case, or on genes potentially linked to cetaceans’ increased brain sizes (McGowen et al. 2011) or hypoxia tolerance (Tian et al. 2016), can be easier to interpret, as opposed to summarizing broader signals. However, these smaller-scale studies can also experience a high multiple testing burden, especially as similar tests are conducted across numerous studies, which often is not accounted for (Ioannidis 2005). In addition, integrating signals across the entire genome will ultimately be needed given that many genetic changes undoubtedly underlie cetaceans’ phenotypic diversification. As highlighted by the fact that past genome-wide analyses have produced some differing biological interpretations (McGowen et al. 2012; Nery et al. 2013; Sun et al. 2013), this is not a trivial undertaking. 

Nonetheless, the work performed in this preprint, and in related research, is valuable for (at least) three reasons. First, although it is a challenging task, a better understanding of the genetic basis of cetacean phenotypes could have benefits for many aspects of cetacean biology, including conservation efforts. In addition, the remarkable phenotypic shifts in cetaceans make the question of what genetic mechanisms underlie these changes intrinsically interesting to a wide audience. Last, since the cetacean fossil record is especially well-documented (Thewissen et al. 2009), cetaceans represent an appealing system to validate and further develop statistical methods for inferring adaptation from genetic data. Uribe and colleagues’ (2024) analyses provide useful insights relevant to each of these points, and have generated intriguing hypotheses for further investigation.


Das, K., Sköld, H., Lorenz, A., Parmentier, E. 2023. Who are the marine mammals? In: “Marine Mammals: A Deep Dive into the World of Science”. Brennecke, D., Knickmeier, K., Pawliczka, I., Siebert, U., Wahlberg, M (editors). Springer, Cham. p. 1–14.

Huelsmann, M., Hecker, N., Springer, M., S., Gatesy, J., Sharma, V., Hiller, M. 2019. Genes lost during the transition from land to water in cetaceans highlight genomic changes associated with aquatic adaptations. Science Advances. 5(9):eaaw6671.

Ioannidis, J., P., A. 2005. Why most published research findings are false. PLOS Medicine. 2(8):e124.

McGowen MR, Montgomery SH, Clark C, Gatesy J. 2011. Phylogeny and adaptive evolution of the brain-development gene microcephalin (MCPH1) in cetaceans. BMC Evolutionary Biology. 11(1):98.

McGowen MR, Grossman LI, Wildman DE. 2012. Dolphin genome provides evidence for adaptive evolution of nervous system genes and a molecular rate slowdown. Proceedings of the Royal Society B: Biological Sciences. 279(1743):3643–3651.

Nery, M., F., González, D., J., Opazo, J., C. 2013. How to make a dolphin: molecular signature of positive selection in cetacean genome. PLOS ONE. 8(6):e65491.

Sun, Y.-B., Zhou, W.-P., Liu, H.-Q., Irwin, D., M., Shen, Y.-Y., Zhang, Y.-P. 2013. Genome-wide scans for candidate genes involved in the aquatic adaptation of dolphins. Genome Biology and Evolution. 5(1):130–139.

Tian, R., Wang, Z., Niu, X., Zhou, K., Xu, S., Yang, G. 2016. Evolutionary genetics of hypoxia tolerance in cetaceans during diving. Genome Biology and Evolution. 8(3):827–839.

Thewissen, J., G., M., Cooper, L., N., George, J., C., Bajpai, S. 2009. From land to water: the origin of whales, dolphins, and porpoises. Evolution: Education and Outreach. 2(2):272–288.

Uribe, C., Nery, M., Zavala, K., Mardones, G., Riadi, G., Opazo, J. 2024. Evolution of ion channels in cetaceans: A natural experiment in the tree of life. bioRxiv, ver. 8 peer-reviewed and recommended by Peer Community in Genomics.

Yim, H.-S., Cho, Y., S., Guang, X., Kang, S., G., Jeong, J.-Y., Cha, S.-S., Oh, H.-M., Lee, J.-H., Yang, E., C., Kwon, K., K., et al. 2014. Minke whale genome and aquatic adaptation in cetaceans. Nature Genetics. 46(1):88–92.


Evolution of ion channels in cetaceans: A natural experiment in the tree of lifeCristóbal Uribe, Mariana F. Nery, Kattina Zavala, Gonzalo A. Mardones, Gonzalo Riadi & Juan C. Opazo<p>Cetaceans could be seen as a natural experiment within the tree of life in which a mammalian lineage changed from terrestrial to aquatic habitats. This shift involved extensive phenotypic modifications, which represent an opportunity to explore...Evolutionary genomicsGavin Douglas2023-07-04 20:53:46 View
20 Jul 2021
article picture

Genetic mapping of sex and self-incompatibility determinants in the androdioecious plant Phillyrea angustifolia

Identification of distinct YX-like loci for sex determination and self-incompatibility in an androdioecious shrub

Recommended by and based on reviews by 2 anonymous reviewers

A wide variety of systems have evolved to control mating compatibility in sexual organisms. Their genetic determinism and the factors controlling their evolution represent fascinating questions in evolutionary biology and genomics. The plant Phillyrea angustifolia (Oleaeceae family) represents an exciting model organism, as it displays two distinct and rare mating compatibility systems [1]: 1) males and hermaphrodites co-occur in populations of this shrub (a rare system called androdioecy), while the evolution and maintenance of purely hermaphroditic plants or mixtures of females and hermaphrodites (a system called gynodioecy) are easier to explain [2]; 2) a homomorphic diallelic self-incompatibility system acts in hermaphrodites, while such systems are usually multi-allelic, as rare alleles are advantageous, being compatible with all other alleles. Previous analyses of crosses brought some interesting answers to these puzzles, showing that males benefit from the ability to mate with all hermaphrodites regardless of their allele at the self-incompatibility system, and suggesting that both sex and self incompatibility are determined by XY-like genetic systems, i.e. with each a dominant allele; homozygotes for a single allele and heterozygotes therefore co-occur in natural populations at both sex and self-incompatibility loci [3].

Here, Carré et al. used genotyping-by-sequencing to build a genome linkage map of P. angustifolia [4]. The elegant and original use of a probabilistic model of segregating alleles (implemented in the SEX-DETector method) allowed to identify both the sex and self-incompatibility loci [4], while this tool was initially developed for detecting sex-linked genes in species with strictly separated sexes (dioecy) [5]. Carré et al. [4] confirmed that the sex and self-incompatibility loci are located in two distinct linkage groups and correspond to XY-like systems. A comparison with the genome of the closely related Olive tree indicated that their self-incompatibility systems were homologous. Such a XY-like system represents a rare genetic determination mechanism for self-incompatibility and has also been recently found to control mating types in oomycetes [6].

This study [4] paves the way for identifying the genes controlling the sex and self-incompatibility phenotypes and for understanding why and how self-incompatibility is only expressed in hermaphrodites and not in males. It will also be fascinating to study more finely the degree and extent of genomic differentiation at these two loci and to assess whether recombination suppression has extended stepwise away from the sex and self-incompatibility loci, as can be expected under some hypotheses, such as the sheltering of deleterious alleles near permanently heterozygous alleles [7]. Furthermore, the co-occurrence in P. angustifolia of sex and mating types can contribute to our understanding of the factor controlling their evolution [8].


[1] Saumitou-Laprade P, Vernet P, Vassiliadis C, Hoareau Y, Magny G de, Dommée B, Lepart J (2010) A Self-Incompatibility System Explains High Male Frequencies in an Androdioecious Plant. Science, 327, 1648–1650.

[2] Pannell JR, Voillemot M (2015) Plant Mating Systems: Female Sterility in the Driver’s Seat. Current Biology, 25, R511–R514.

[3] Billiard S, Husse L, Lepercq P, Godé C, Bourceaux A, Lepart J, Vernet P, Saumitou-Laprade P (2015) Selfish male-determining element favors the transition from hermaphroditism to androdioecy. Evolution, 69, 683–693.

[4] Carre A, Gallina S, Santoni S, Vernet P, Gode C, Castric V, Saumitou-Laprade P (2021) Genetic mapping of sex and self-incompatibility determinants in the androdioecious plant Phillyrea angustifolia. bioRxiv, 2021.04.15.439943, ver. 7 peer-reviewed and recommended by Peer Community in Genomics.

[5] Muyle A, Käfer J, Zemp N, Mousset S, Picard F, Marais GA (2016) SEX-DETector: A Probabilistic Approach to Study Sex Chromosomes in Non-Model Organisms. Genome Biology and Evolution, 8, 2530–2543.

[6] Dussert Y, Legrand L, Mazet ID, Couture C, Piron M-C, Serre R-F, Bouchez O, Mestre P, Toffolatti SL, Giraud T, Delmotte F (2020) Identification of the First Oomycete Mating-type Locus Sequence in the Grapevine Downy Mildew Pathogen, Plasmopara viticola. Current Biology, 30, 3897-3907.e4.

[7] Jay P, Tezenas E, Giraud T (2021) A deleterious mutation-sheltering theory for the evolution of sex chromosomes and supergenes. bioRxiv, 2021.05.17.444504.

[8] Billiard S, López-Villavicencio M, Devier B, Hood ME, Fairhead C, Giraud T (2011) Having sex, yes, but with whom? Inferences from fungi on the evolution of anisogamy and mating types. Biological Reviews, 86, 421–442.

Genetic mapping of sex and self-incompatibility determinants in the androdioecious plant Phillyrea angustifoliaAmelie Carre, Sophie Gallina, Sylvain Santoni, Philippe Vernet, Cecile Gode, Vincent Castric, Pierre Saumitou-Laprade<p style="text-align: justify;">The diversity of mating and sexual systems in angiosperms is spectacular, but the factors driving their evolution remain poorly understood. In plants of the Oleaceae family, an unusual self-incompatibility (SI) syst...Evolutionary genomics, PlantsTatiana Giraud2021-05-04 10:37:26 View
15 Sep 2022
article picture

EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes

EukProt enables reproducible Eukaryota-wide protein sequence analyses

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

 Comparative genomics is a general approach for understanding how genomes differ, which can be considered from many angles. For instance, this approach can delineate how gene content varies across organisms, which can lead to novel hypotheses regarding what those organisms do. It also enables investigations into the sequence-level divergence of orthologous DNA, which can provide insight into how evolutionary forces differentially shape genome content and structure across lineages. 
Such comparisons are often restricted to protein-coding genes, as these are sensible units for assessing putative function and for identifying homologous matches in divergent genomes. Although information is lost by focusing only on the protein-coding portion of genomes, this simplifies analyses and has led to crucial findings in recent years. Perhaps most dramatically, analyses based on hundreds of orthologous proteins across microbial eukaryotes are fundamentally changing our understanding of the eukaryotic tree of life (Burki et al. 2020).
These and other topics are highlighted in a new pre-print from Dr. Daniel Richter and colleagues, which describes EukProt (Richter et al. 2022): a database containing protein sets from 993 eukaryotic species. The authors provide a BLAST portal for matching custom sequences against this database ( and the entire database is available for download ( They also provide a subset of their overall dataset, ‘The Comparative Set’, which contains only high-quality proteomes and is meant to maximize phylogenetic diversity.
There are two major advantages of EukProt:
   1. It will enable researchers to quickly compare proteomes and perform phylogenomic analyses, without needing the skills or the time commitment to aggregate and process these data. The authors make it clear that acquiring the raw protein sets was non-trivial, as they were distributed across a wide variety of online repositories (some of which are no longer accessible!).
    2. Analyses based on this database will be more reproducible and easily compared across studies than those based on custom-made databases for individual studies. This is because the EukProt authors followed FAIR principles (Wilkinson et al. 2016) when building their database, which is a set of guidelines for enhancing data reusability. So, for instance, each proteome has a unique identifier in EukProt, and all species are annotated in a unified taxonomic framework, which will aid in standardizing comparisons across studies.
The authors make it clear that there is still work to be done. For example, there is an uneven representation of proteomes across different eukaryotic lineages, which can only be addressed by further characterization of poorly studied lineages. In addition, the authors note that it would ultimately be best for the EukProt database to be integrated into an existing large-scale repository, like NCBI, which would help ensure that important eukaryotic diversity was not ignored. Nonetheless, EukProt represents an excellent example of how reproducible bioinformatics resources should be designed and should prove to be an extremely useful resource for the field.

Burki F, Roger AJ, Brown MW, Simpson AGB (2020) The New Tree of Eukaryotes. Trends in Ecology & Evolution, 35, 43–55.

Richter DJ, Berney C, Strassert JFH, Poh Y-P, Herman EK, Muñoz-Gómez SA, Wideman JG, Burki F, Vargas C de (2022) EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes. bioRxiv, 2020.06.30.180687, ver. 5 peer-reviewed and recommended by Peer Community in Genomics.

Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotesDaniel J. Richter, Cédric Berney, Jürgen F. H. Strassert, Yu-Ping Poh, Emily K. Herman, Sergio A. Muñoz-Gómez, Jeremy G. Wideman, Fabien Burki, Colomban de Vargas<p style="text-align: justify;">EukProt is a database of published and publicly available predicted protein sets selected to represent the breadth of eukaryotic diversity, currently including 993 species from all major supergroups as well as orpha...Bioinformatics, Evolutionary genomicsGavin Douglas2022-06-08 14:19:28 View
13 Jul 2022
article picture

Karyorelict ciliates use an ambiguous genetic code with context-dependent stop/sense codons

An accident frozen in time: the ambiguous stop/sense genetic code of karyorelict ciliates

Recommended by ORCID_LOGO based on reviews by Vittorio Boscaro and 2 anonymous reviewers

Several variations of the “universal” genetic code are known. Among the most striking are those where a codon can either encode for an amino acid or a stop signal depending on the context. Such ambiguous codes are known to have evolved in eukaryotes multiple times independently, particularly in ciliates – eight different codes have so far been discovered (1). We generally view such genetic codes are rare ‘variants’ of the standard code restricted to single species or strains, but this might as well reflect a lack of study of closely related species. In this study, Seah and co-authors (2) explore the possibility of codon reassignment in karyorelict ciliates closely related to Parduczia sp., which has been shown to contain an ambiguous genetic code (1). Here, single-cell transcriptomics are used, along with similar available data, to explore the possibility of codon reassignment across the diversity of Karyorelictea (four out of the six recognized families). Codon reassignments were inferred from their frequencies within conserved Pfam (3) protein domains, whereas stop codons were inferred from full-length transcripts with intact 3’-UTRs.

Results show the reassignment of UAA and UAG stop codons to code for glutamine (Q) and the reassignment of the UGA stop codon into tryptophan (W). This occurs only within the coding sequences, whereas the end of transcription is marked by UGA as the main stop codon, and to a lesser extent by UAA. In agreement with a previous model proposed that explains the functioning of ambiguous codes (1,4), the authors observe a depletion of in-frame UGAs before the UGA codon that indicates the stop, thus avoiding premature termination of transcription. The inferred codon reassignments occur in all studied karyorelicts, including the previously studied Parduczia sp. Despite the overall clear picture, some questions remain. Data for two out of six main karyorelict lineages are so far absent and the available data for Cryptopharyngidae was inconclusive; the phylogenetic affinities of Cryptopharyngidae have also been questioned (5). This indicates the need for further study of this interesting group of organisms. As nicely discussed by the authors, experimental evidence could further strengthen the conclusions of this paper, including ribosome profiling, mass spectrometry – as done for Condylostoma (1) – or even direct genetic manipulation. 

The uniformity of the ambiguous genetic code across karyorelicts might at first seem dull, but when viewed in a phylogenetic context character distribution strongly suggest that this genetic code has an ancient origin in the karyorelict ancestor ~455 Ma in the Proterozoic (6). This ambiguous code is also not a rarity of some obscure species, but it is shared by ciliates that are very diverse and ecologically important. The origin of the karyorelict code is also intriguing. Adaptive arguments suggest that it could confer robustness to mutations causing premature stop codons. However, we lack evidence for ambiguous codes being linked to specific habitats of lifestyles that could account for it. Instead, the authors favor the neutral view of an ancient “frozen accident”, fixed stochastically simply because it did not pose a significant selective disadvantage. Once a stop codon is reassigned to an amino acid, it is increasingly difficult to revert this without the deleterious effect of prematurely terminating translation. At the end, the origin of the genetic code itself is thought to be a frozen accident too (7).


1. Swart EC, Serra V, Petroni G, Nowacki M. Genetic codes with no dedicated stop codon: Context-dependent translation termination. Cell 2016;166: 691–702.

2. Seah BKB, Singh A, Swart EC (2022) Karyorelict ciliates use an ambiguous genetic code with context-dependent stop/sense codons. bioRxiv, 2022.04.12.488043. ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

3. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021, Nuc Acids Res 2020;49: D412-D419.

4. Alkalaeva E, Mikhailova T. Reassigning stop codons via translation termination: How a few eukaryotes broke the dogma. Bioessays. 2017;39.

5. Xu Y, Li J, Song W, Warren A. Phylogeny and establishment of a new ciliate family, Wilbertomorphidae fam. nov. (Ciliophora, Karyorelictea), a highly specialized taxon represented by Wilbertomorpha colpoda gen. nov., spec. nov. J Eukaryot Microbiol. 2013;60: 480–489.

6. Fernandes NM, Schrago CG. A multigene timescale and diversification dynamics of Ciliophora evolution. Mol Phylogenet Evol. 2019;139: 106521.

7. Crick FH. The origin of the genetic code. J Mol Biol. 1968;38: 367–379.

Karyorelict ciliates use an ambiguous genetic code with context-dependent stop/sense codonsBrandon Kwee Boon Seah, Aditi Singh, Estienne Carl Swart<p style="text-align: justify;">In ambiguous stop/sense genetic codes, the stop codon(s) not only terminate translation but can also encode amino acids. Such codes have evolved at least four times in eukaryotes, twice among ciliates (<em>Condylost...Bioinformatics, Evolutionary genomicsIker Irisarri2022-05-02 11:06:10 View
24 Sep 2020
article picture

A rapid and simple method for assessing and representing genome sequence relatedness

A quick alternative method for resolving bacterial taxonomy using short identical DNA sequences in genomes or metagenomes

Recommended by based on reviews by Gavin Douglas and 1 anonymous reviewer

The bacterial species problem can be summarized as follows: bacteria recombine too little, and yet too much (Shapiro 2019).
Too little in the sense that recombination is not obligately coupled with reproduction, as in sexual eukaryotes. So the Biological Species Concept (BSC) of reproductive isolation does not strictly apply to clonally reproducing organisms like bacteria. Too much in the sense that genetic exchange can occur promiscuously across species (or even Domains), potentially obscuring species boundaries.
In parallel to such theoretical considerations, several research groups have taken more pragmatic approaches to defining bacterial species based on sequence similarity cutoffs, such as genome-wide average nucleotide identity (ANI). At a cutoff above 95% ANI, genomes are considered to come from the same species. While this cutoff may appear arbitrary, a discontinuity around 95% in the distribution of ANI values has been argued to provide a 'natural' cutoff (Jain et al. 2018). This discontinuity has been criticized as being an artefact of various biases in genome databases (Murray, Gao, and Wu 2020), but appears to be a general feature of relatively unbiased metagenome-assembled genomes as well (Olm et al. 2020). The 95% cutoff has been suggested to represent a barrier to homologous recombination (Olm et al. 2020), although clusters of genetic exchange consistent with BSC-like species are observed at much finer identity cutoffs (Shapiro 2019; Arevalo et al. 2019).
Although 95% ANI is the most widely used genomic standard for species delimitation, it is by no means the only plausible approach. In particular, tracts of identical DNA provide evidence for recent genetic exchange, which in turn helps define BSC-like clusters of genomes (Arevalo et al. 2019). In this spirit, Briand et al. (2020) introduce a genome-clustering method based on the number of shared identical DNA sequences of length k (or k-mers). Using a test dataset of Pseudomonas genomes, they find that 95% ANI corresponds to approximately 50% of shared 15-mers. Applying this cutoff yields 350 Pseudomonas species, whereas the current taxonomy only includes 207 recognized species. To determine whether splitting the genus into a greater number of species is at all useful, they compare their new classification scheme to the traditional one in terms of the ability to taxonomically classify metagenomic sequencing reads from three Pseudomonas-rich environments. In all cases, the new scheme (termed K-IS for "Kinship relationships Identification with Shared k-mers") yielded a higher number of classified reads, with an average improvement of 1.4-fold. This is important because increasing the number of genome sequences in a reference database – without consistent taxonomic annotation of these genomes – paradoxically leads to fewer classified metagenomic reads. Thus a rapid, automated taxonomy such as the one proposed here offers an opportunity to more fully harness the information from metagenomes.
KI-S is also fast to run, so it is feasible to test several values of k and quickly visualize the clustering using an interactive, zoomable circle-packing display (that resembles a cross-section of densely packed, three-dimensional dendrogram). This interface allows the rapid flagging of misidentified species, or understudied species with few sequenced representatives as targets for future study. Hopefully these initial Pseudomonas results will inspire future studies to apply the method to additional taxa, and to further characterize the relationship between ANI and shared identical k-mers. Ultimately, I hope that such investigations will resolve the issue of whether or not there is a 'natural' discontinuity for bacterial species, and what evolutionary forces maintain this cutoff.


Arevalo P, VanInsberghe D, Elsherbini J, Gore J, Polz MF (2019) A Reverse Ecology Approach Based on a Biological Definition of Microbial Populations. Cell, 178, 820-834.e14.
Briand M, Bouzid M, Hunault G, Legeay M, Saux MF-L, Barret M (2020) A rapid and simple method for assessing and representing genome sequence relatedness. bioRxiv, 569640, ver. 5 peer-reveiwed and recommended by PCI Genomics.
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications, 9, 5114.
Murray CS, Gao Y, Wu M (2020) There is no evidence of a universal genetic boundary among microbial species. bioRxiv, 2020.07.27.223511.
Olm MR, Crits-Christoph A, Diamond S, Lavy A, Carnevali PBM, Banfield JF (2020) Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries. mSystems, 5.
Shapiro BJ (2019) What Microbial Population Genomics Has Taught Us About Speciation. In: Population Genomics: Microorganisms Population Genomics. (eds Polz MF, Rajora OP), pp. 31–47. Springer International Publishing, Cham.

A rapid and simple method for assessing and representing genome sequence relatednessM Briand, M Bouzid, G Hunault, M Legeay, M Fischer-Le Saux, M Barret<p>Coherent genomic groups are frequently used as a proxy for bacterial species delineation through computation of overall genome relatedness indices (OGRI). Average nucleotide identity (ANI) is a widely employed method for estimating relatedness ...Bioinformatics, MetagenomicsB. Jesse Shapiro Gavin Douglas2019-11-07 16:37:56 View
07 Feb 2023
article picture

RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes

A workflow for studying enigmatic non-autonomous transposable elements across bacteria

Recommended by ORCID_LOGO based on reviews by Sophie Abby and 1 anonymous reviewer

Repetitive extragenic palindromic sequences (REPs) are common repetitive elements in bacterial genomes (Gilson et al., 1984; Stern et al., 1984). In 2011, Bertels and Rainey identified that REPs are overrepresented in pairs of inverted repeats, which likely form hairpin structures, that they referred to as “REP doublets forming hairpins” (REPINs). Based on bioinformatics analyses, they argued that REPINs are likely selfish elements that evolved from REPs flanking particular transposes (Bertels and Rainey, 2011). These transposases, so-called REP-associated tyrosine transposases (RAYTs), were known to be highly associated with the REP content in a genome and to have characteristic upstream and downstream flanking REPs (Nunvar et al., 2010). The flanking REPs likely enable RAYT transposition, and their horizontal replication is physically linked to this process. In contrast, Bertels and Rainey hypothesized that REPINs are selfish elements that are highly replicated due to the similarity in arrangement to these RAYT-flanking REPs, but independent of RAYT transposition and generally with no impact on bacterial fitness (Bertels and Rainey, 2011).

This last point was especially contentious, as REPINs are highly conserved within species (Bertels and Rainey, 2023), which is unusual for non-beneficial bacterial DNA (Mira et al., 2001). Bertels and Rainey have since refined their argument to be that REPINs must provide benefits to host cells, but that there are nonetheless signatures of intragenomic conflict in genomes associated with these elements (Bertels and Rainey, 2023). These signatures reflect the divergent levels of selections driving REPIN distribution: selection at the level of each DNA element and selection on each individual bacterium. I found this observation particularly interesting as I and my colleague recently argued that these divergent levels of selection, and the interaction between them, is key to understanding bacterial pangenome diversity (Douglas and Shapiro, 2021). REPINs could be an excellent system for investigating these levels of selection across bacteria more generally.

The problem is that REPINs have not been widely characterized in bacterial genomes, partially because no bioinformatic workflow has been available for this purpose. To address this problem, Fortmann-Grote et al. (2023) developed RAREFAN, which is a web server for identifying RAYTs and associated REPINs in a set of input genomes. The authors showcase their tool by applying it to 49 Stenotrophomonas maltophilia genomes and providing examples of how to identify and assess RAYT-REPIN hits. The workflow requires several manual steps, but nonetheless represents a straightforward and standardized approach. Overall, this workflow should enable RAYTs and REPINs to be identified across diverse bacterial species, which will facilitate further investigation into the mechanisms driving their maintenance and spread.


Bertels F, Rainey PB (2023) Ancient Darwinian replicators nested within eubacterial genomes. BioEssays, 45, 2200085.

Bertels F, Rainey PB (2011) Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria. PLOS Genetics, 7, e1002132.

Douglas GM, Shapiro BJ (2021) Genic Selection Within Prokaryotic Pangenomes. Genome Biology and Evolution, 13, evab234.

Fortmann-Grote C, Irmer J von, Bertels F (2023) RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. bioRxiv, 2022.05.22.493013, ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

Gilson E, Clément J m., Brutlag D, Hofnung M (1984) A family of dispersed repetitive extragenic palindromic DNA sequences in E. coli. The EMBO Journal, 3, 1417–1421.

Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends in Genetics, 17, 589–596.

Nunvar J, Huckova T, Licha I (2010) Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC Genomics, 11, 44.

Stern MJ, Ames GF-L, Smith NH, Clare Robinson E, Higgins CF (1984) Repetitive extragenic palindromic sequences: A major component of the bacterial genome. Cell, 37, 1015–1026.

RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomesFrederic Bertels, Julia von Irmer, Carsten Fortmann-Grote<p style="text-align: justify;">Compared to eukaryotes, repetitive sequences are rare in bacterial genomes and usually do not persist for long. Yet, there is at least one class of persistent prokaryotic mobile genetic elements: REPINs. REPINs are ...Bacteria and archaea, Bioinformatics, Evolutionary genomics, Viruses and transposable elementsGavin Douglas2022-06-07 08:21:34 View
06 Apr 2021
article picture

Evidence for shared ancestry between Actinobacteria and Firmicutes bacteriophages

Viruses of bacteria: phages evolution across phylum boundaries

Recommended by Denis Tagu based on reviews by 3 anonymous reviewers

Bacteria and phages have coexisted and coevolved for a long time. Phages are bacteria-infecting viruses, with a symbiotic status sensu lato, meaning they can be pathogenic, commensal or mutualistic. Thus, the association between bacteria phages has probably played a key role in the high adaptability of bacteria to most - if not all – of Earth’s ecosystems, including other living organisms (such as eukaryotes), and also regulate bacterial community size (for instance during bacterial blooms). 

As genetic entities, phages are submitted to mutations and natural selection, which changes their DNA sequence. Therefore, comparative genomic analyses of contemporary phages can be useful to understand their evolutionary dynamics. International initiatives such as SEA-PHAGES have started to tackle the issue of history of phage-bacteria interactions and to describe the dynamics of the co-evolution between bacterial hosts and their associated viruses. Indeed, the understanding of this cross-talk has many potential implications in terms of health and agriculture, among others.

The work of Koert et al. (2021) deals with one of the largest groups of bacteria (Actinobacteria), which are Gram-positive bacteria mainly found in soil and water. Some soil-born Actinobacteria develop filamentous structures reminiscent of the mycelium of eukaryotic fungi. In this study, the authors focused on the Streptomyces clade, a large genus of Actinobacteria colonized by phages known for their high level of genetic diversity.

The authors tested the hypothesis that large exchanges of genetic material occurred between Streptomyces and diverse phages associated with bacterial hosts. Using public datasets, their comparative phylogenomic analyses identified a new cluster among Actinobacteria–infecting phages closely related to phages of Firmicutes. Moreover, the GC content and codon-usage biases of this group of phages of Actinobacteria are similar to those of Firmicutes. 

This work demonstrates for the first time the transfer of a bacteriophage lineage from one bacterial phylum to another one. The results presented here suggest that the age of the described transfer is probably recent since several genomic characteristics of the phage are not fully adapted to their new hosts. However, the frequency of such transfer events remains an open question. If frequent, such exchanges would mean that pools of bacteriophages are regularly fueled by genetic material coming from external sources, which would have important implications for the co-evolutionary dynamics of phages and bacteria.


Koert, M., López-Pérez, J., Courtney Mattson, C., Caruso, S. and Erill, I. (2021) Evidence for shared ancestry between Actinobacteria and Firmicutes bacteriophages. bioRxiv, 842583, version 5 peer-reviewed and recommended by Peer community in Genomics. doi: 

Evidence for shared ancestry between Actinobacteria and Firmicutes bacteriophagesMatthew Koert, Júlia López-Pérez, Courtney Mattson, Steven M. Caruso, Ivan Erill<p>Bacteriophages typically infect a small set of related bacterial strains. The transfer of bacteriophages between more distant clades of bacteria has often been postulated, but remains mostly unaddressed. In this work we leverage the sequencing ...Evolutionary genomicsDenis Tagu 2019-12-10 15:26:31 View
15 Jan 2024
article picture

The genome sequence of the Montseny horsehair worm, Gordionus montsenyensis sp. nov., a key resource to investigate Ecdysozoa evolution

Embarking on a novel journey in Metazoa evolution through the pioneering sequencing of a key underrepresented lineage

Recommended by ORCID_LOGO based on reviews by Gonzalo Riadi and 2 anonymous reviewers

Whole genome sequences are revolutionizing our understanding across various biological fields. They not only shed light on the evolution of genetic material but also uncover the genetic basis of phenotypic diversity. The sequencing of underrepresented lineages, such as the one presented in this study, is of critical importance. It is crucial in filling significant gaps in our understanding of Metazoa evolution. Despite the wealth of genome sequences in public databases, it is crucial to acknowledge that some lineages across the Tree of Life are underrepresented or absent. This research represents a significant step towards addressing this imbalance, contributing to the collective knowledge of the global scientific community.

In this genome note, as part of the European Reference Genome Atlas pilot effort to generate reference genomes for European biodiversity (Mc Cartney et al. 2023), Klara Eleftheriadi and colleagues (Eleftheriadi et al. 2023) make a significant effort to add a genome sequence of an unrepresented group in the animal Tree of Life. More specifically, they present a taxonomic description and chromosome-level genome assembly of a newly described species of horsehair worm (Gordionus montsenyensis). Their sequence methodology gave rise to an assembly of 396 scaffolds totaling 288 Mb, with an N50 value of 64.4 Mb, where 97% of this assembly is grouped into five pseudochromosomes. The nuclear genome annotation predicted 10,320 protein-coding genes, and they also assembled the circular mitochondrial genome into a 15-kilobase sequence.

The selection of a species representing the phylum Nematomorpha, a group of parasitic organisms belonging to the Ecdysozoa lineage, is good, since today, there is only one publicly available genome for this animal phylum (Cunha et al. 2023). Interestingly, this article shows, among other things, that the species analyzed has lost ∼30% of the universal Metazoan genes. Efforts, like the one performed by Eleftheriadi and colleagues, are necessary to gain more insights, for example, on the evolution of this massive gene lost in this group of animals.


Cunha, T. J., de Medeiros, B. A. S, Lord, A., Sørensen, M. V., and Giribet, G. (2023). Rampant Loss of Universal Metazoan Genes Revealed by a Chromosome-Level Genome Assembly of the Parasitic Nematomorpha. Current Biology, 33 (16): 3514–21.e4.

Eleftheriadi, K., Guiglielmoni, N., Salces-Ortiz, J., Vargas-Chavez, C., Martínez-Redondo, G. I., Gut, M., Flot, J.-F., Schmidt-Rhaesa, A., and Fernández, R. (2023). The Genome Sequence of the Montseny Horsehair worm, Gordionus montsenyensis sp. Nov., a Key Resource to Investigate Ecdysozoa Evolution. bioRxiv, ver. 3 peer-reviewed and recommended by Peer Community in Genomics.

Mc Cartney, A. M., Formenti, G., Mouton, A., De Panis, D., Marins, L. S., Leitão, H. G., Diedericks, G., et al. (2023). The European Reference Genome Atlas: Piloting a Decentralised Approach to Equitable Biodiversity Genomics. bioRxiv.

The genome sequence of the Montseny horsehair worm, *Gordionus montsenyensis* sp. nov., a key resource to investigate Ecdysozoa evolutionEleftheriadi Klara, Guiglielmoni Nadège, Salces-Ortiz Judit, Vargas-Chávez Carlos, Martínez-Redondo Gemma I, Gut Marta, Flot Jean François, Schmidt-Rhaesa Andreas, Fernández Rosa<p>Nematomorpha, also known as Gordiacea or Gordian worms, are a phylum of parasitic organisms that belong to the Ecdysozoa, a clade of invertebrate animals characterized by molting. They are one of the less scientifically studied animal phyla, an...ERGA PilotJuan C. Opazo2023-06-29 10:31:36 View
27 Apr 2021
article picture

Uncovering transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzzii

Anopheles coluzzii, a new system to study how transposable elements may foster adaptation to urban environments

Recommended by based on reviews by Yann Bourgeois and 1 anonymous reviewer

Transposable elements (TEs) are mobile DNA sequences that can increase their copy number and move from one location to another within the genome [1]. Because of their transposition dynamics, TEs constitute a significant fraction of eukaryotic genomes. TEs are also known to play an important functional role and a wealth of studies has now reported how TEs may influence single host traits [e.g. 2–4]. Given that TEs are more likely than classical point mutations to cause extreme changes in gene expression and phenotypes, they might therefore be especially prone to produce the raw diversity necessary for individuals to respond to challenging environments [5,6] such as the ones found in urban area.  
In their study [7], Vargas et al. establish the foundation to investigate how TEs may help Anopheles coluzzii -  the primary vectors of human malaria in sub-Saharan Africa - adapt to urban environments. To cover natural breeding sites in major Central Africa cities, they made use of the previously available An. coluzzii genome from Yaoundé (Cameroon) and sequenced with long-read technology six additional ones originating from Douala (Cameroon) and Libreville (Gabon). The de novo annotation of TEs in these genomes revealed 64 new anopheline TE families and allowed to identify seven active families. As a first step towards characterizing the potential role of TEs in the adaptation of An. coluzzii to urban environments, they further analyzed the distribution of TEs across the seven genomes. By doing so, they identified a significant number of polymorphic or fixed TE insertions located in the vicinity of genes involved in insecticide resistance and immune response genes.  
The availability of seven An. coluzzii genomes allowed the authors to explore how TE diversity may affect genes functionally relevant for the adaptation to urban environments and provide ground for further functional validation studies. More and more studies have demonstrated the impact of TEs on adaptation and as such, the work of Vargas et al. contributes to fostering our understanding of the link between TEs and gain of function in a species facing strong anthropogenic pressures.  
[1] Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH (2007) A unified classification system for eukaryotic transposable elements. Nature Reviews Genetics, 8, 973–982.    
[2] van’t Hof AE, Campagne P, Rigden DJ, Yung CJ, Lingley J, Quail MA, Hall N, Darby AC, Saccheri IJ (2016) The industrial melanism mutation in British peppered moths is a transposable element. Nature, 534, 102–105.    
[3] González J, Karasov TL, Messer PW, Petrov DA (2010) Genome-wide patterns of adaptation to temperate environments associated with transposable elements in Drosophila. PLOS Genetics, 6, e1000905.  
[4] Lisch D (2013) How important are transposons for plant evolution? Nature Reviews Genetics, 14, 49–61.    
[5] Bonchev G, Parisod C (2013) Transposable elements and microevolutionary changes in natural populations. Molecular Ecology Resources, 13, 765–775.  
[6] Casacuberta E, González J (2013) The impact of transposable elements in environmental adaptation. Molecular Ecology, 22, 1503–1517.    
[7] Vargas-Chavez C, Pendy NML, Nsango SE, Aguilera L, Ayala D, González J (2021). Uncovering transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzzii. bioRxiv, 2020.11.22.393231, ver. 3 peer-reviewed and recommended by Peer community in Genomics.  


Uncovering transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzziiCarlos Vargas-Chavez, Neil Michel Longo Pendy, Sandrine E. Nsango, Laura Aguilera, Diego Ayala, and Josefa González<p style="text-align: justify;">Background</p> <p style="text-align: justify;">Anopheles coluzzii is one of the primary vectors of human malaria in sub-Saharan Africa. Recently, it has colonized the main cities of Central Africa threatening vecto...Evolutionary genomicsAnne Roulin2020-12-02 14:58:47 View
06 May 2022
article picture

A deep dive into genome assemblies of non-vertebrate animals

Diving, and even digging, into the wild jungle of annotation pathways for non-vertebrate animals

Recommended by ORCID_LOGO based on reviews by Yann Bourgeois, Cécile Monat, Valentina Peona and Benjamin Istace

In their paper, Guiglielmoni et al. propose we pick up our snorkels and palms and take "A deep dive into genome assemblies of non-vertebrate animals" (1). Indeed, while numerous assembly-related tools were developed and tested for human genomes (or at least vertebrates such as mice), very few were tested on non-vertebrate animals so far. Moreover, most of the benchmarks are aimed at raw assembly tools, and very few offer a guide from raw reads to an almost finished assembly, including quality control and phasing.

This huge and exhaustive review starts with an overview of the current sequencing technologies, followed by the theory of the different approaches for assembly and their implementation. For each approach, the authors present some of the most representative tools, as well as the limits of the approach.

The authors additionally present all the steps required to obtain an almost complete assembly at a chromosome-scale, with all the different technologies currently available for scaffolding, QC, and phasing, and the way these tools can be applied to non-vertebrates animals. Finally, they propose some useful advice on the choice of the different approaches (but not always tools, see below), and advocate for a robust genome database with all information on the way the assembly was obtained.

This review is a very complete one for now and is a very good starting point for any student or scientist interested to start working on genome assembly, from either model or non-model organisms. However, the authors do not provide a list of tools or a benchmark of them as a recommendation. Why? Because such a proposal may be obsolete in less than a year.... Indeed, with the explosion of the 3rd generation of sequencing technology, assembly tools (from different steps) are constantly evolving, and their relative performance increases on a monthly basis. In addition, some tools are really efficient at the time of a review or of an article, but are not further developed later on, and thus will not evolve with the technology. We have all seen it with wonderful tools such as Chiron (2) or TopHat (3), which were very promising ones, but cannot be developed further due to the stop of the project, the end of the contract of the post-doc in charge of the development, or the decision of the developer to switch to another paradigm. Such advice would, therefore, need to be constantly updated.

Thus, the manuscript from Guiglielmoni et al will be an almost intemporal one (up to the next sequencing revolution at last), and as they advocated for a more informed genome database, I think we should consider a rolling benchmarking system (tools, genome and sequence dataset) allowing to keep the performance of the tools up-to-date, and to propose the best set of assembly tools for a given type of genome.


1. Guiglielmoni N, Rivera-Vicéns R, Koszul R, Flot J-F (2022) A Deep Dive into Genome Assemblies of Non-vertebrate Animals. Preprints, 2021110170, ver. 3 peer-reviewed and recommended by Peer Community in Genomics.

2. Teng H, Cao MD, Hall MB, Duarte T, Wang S, Coin LJM (2018) Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience, 7, giy037.

3. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105–1111.

A deep dive into genome assemblies of non-vertebrate animalsNadège Guiglielmoni, Ramón Rivera-Vicéns, Romain Koszul, Jean-François Flot<p style="text-align: justify;">Non-vertebrate species represent about ∼95% of known metazoan (animal) diversity. They remain to this day relatively unexplored genetically, but understanding their genome structure and function is pivotal for expan...Bioinformatics, Evolutionary genomicsFrancois Sabot Valentina Peona, Benjamin Istace, Cécile Monat, Yann Bourgeois2021-11-10 17:47:31 View