Submit a preprint

Latest recommendations

IdTitle * Authors * Abstract * Picture * Thematic fields * RecommenderReviewersSubmission date
07 Feb 2023
article picture

RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes

A workflow for studying enigmatic non-autonomous transposable elements across bacteria

Recommended by ORCID_LOGO based on reviews by Sophie Abby and 1 anonymous reviewer

Repetitive extragenic palindromic sequences (REPs) are common repetitive elements in bacterial genomes (Gilson et al., 1984; Stern et al., 1984). In 2011, Bertels and Rainey identified that REPs are overrepresented in pairs of inverted repeats, which likely form hairpin structures, that they referred to as “REP doublets forming hairpins” (REPINs). Based on bioinformatics analyses, they argued that REPINs are likely selfish elements that evolved from REPs flanking particular transposes (Bertels and Rainey, 2011). These transposases, so-called REP-associated tyrosine transposases (RAYTs), were known to be highly associated with the REP content in a genome and to have characteristic upstream and downstream flanking REPs (Nunvar et al., 2010). The flanking REPs likely enable RAYT transposition, and their horizontal replication is physically linked to this process. In contrast, Bertels and Rainey hypothesized that REPINs are selfish elements that are highly replicated due to the similarity in arrangement to these RAYT-flanking REPs, but independent of RAYT transposition and generally with no impact on bacterial fitness (Bertels and Rainey, 2011).

This last point was especially contentious, as REPINs are highly conserved within species (Bertels and Rainey, 2023), which is unusual for non-beneficial bacterial DNA (Mira et al., 2001). Bertels and Rainey have since refined their argument to be that REPINs must provide benefits to host cells, but that there are nonetheless signatures of intragenomic conflict in genomes associated with these elements (Bertels and Rainey, 2023). These signatures reflect the divergent levels of selections driving REPIN distribution: selection at the level of each DNA element and selection on each individual bacterium. I found this observation particularly interesting as I and my colleague recently argued that these divergent levels of selection, and the interaction between them, is key to understanding bacterial pangenome diversity (Douglas and Shapiro, 2021). REPINs could be an excellent system for investigating these levels of selection across bacteria more generally.

The problem is that REPINs have not been widely characterized in bacterial genomes, partially because no bioinformatic workflow has been available for this purpose. To address this problem, Fortmann-Grote et al. (2023) developed RAREFAN, which is a web server for identifying RAYTs and associated REPINs in a set of input genomes. The authors showcase their tool by applying it to 49 Stenotrophomonas maltophilia genomes and providing examples of how to identify and assess RAYT-REPIN hits. The workflow requires several manual steps, but nonetheless represents a straightforward and standardized approach. Overall, this workflow should enable RAYTs and REPINs to be identified across diverse bacterial species, which will facilitate further investigation into the mechanisms driving their maintenance and spread.

References

Bertels F, Rainey PB (2023) Ancient Darwinian replicators nested within eubacterial genomes. BioEssays, 45, 2200085. https://doi.org/10.1002/bies.202200085

Bertels F, Rainey PB (2011) Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria. PLOS Genetics, 7, e1002132. https://doi.org/10.1371/journal.pgen.1002132

Douglas GM, Shapiro BJ (2021) Genic Selection Within Prokaryotic Pangenomes. Genome Biology and Evolution, 13, evab234. https://doi.org/10.1093/gbe/evab234

Fortmann-Grote C, Irmer J von, Bertels F (2023) RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. bioRxiv, 2022.05.22.493013, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.05.22.493013

Gilson E, Clément J m., Brutlag D, Hofnung M (1984) A family of dispersed repetitive extragenic palindromic DNA sequences in E. coli. The EMBO Journal, 3, 1417–1421. https://doi.org/10.1002/j.1460-2075.1984.tb01986.x

Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends in Genetics, 17, 589–596. https://doi.org/10.1016/S0168-9525(01)02447-7

Nunvar J, Huckova T, Licha I (2010) Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC Genomics, 11, 44. https://doi.org/10.1186/1471-2164-11-44

Stern MJ, Ames GF-L, Smith NH, Clare Robinson E, Higgins CF (1984) Repetitive extragenic palindromic sequences: A major component of the bacterial genome. Cell, 37, 1015–1026. https://doi.org/10.1016/0092-8674(84)90436-7

RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomesFrederic Bertels, Julia von Irmer, Carsten Fortmann-Grote<p style="text-align: justify;">Compared to eukaryotes, repetitive sequences are rare in bacterial genomes and usually do not persist for long. Yet, there is at least one class of persistent prokaryotic mobile genetic elements: REPINs. REPINs are ...Bacteria and archaea, Bioinformatics, Evolutionary genomics, Viruses and transposable elementsGavin Douglas2022-06-07 08:21:34 View
13 Nov 2024
article picture

Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virus

Leveraging HHpred with rigorous validation for improved detection of host-virus homologies

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

The assessment by Brézellec (2024) of the quality of HHpred-based SARS-CoV-2 protein annotations against the traditional Pfam annotations is highly justified and valuable. HHpred’s ability to detect remote homologies offers an expanded view of viral protein similarities, potentially uncovering subtle functional mimicries that Pfam may miss due to its sensitivity limitations when dealing with divergent sequences. However, the accuracy and specificity of HHpred results can be compromised by false positives, especially when dealing with complex viral proteins that feature transmembrane or low-complexity regions prone to spurious matches.

To address this, the author made a thoughtful decision to implement a multi-step validation protocol. This approach included establishing progressively lower probability thresholds to capture weaker but biologically plausible hits, and organizing hits into “families” of similarly located alignments to validate the robustness of matches. They also cross-verified results by running SARS-CoV-2 protein queries against non-human proteomes (plants, fruit flies, bacteria, and archaea), allowing them to discern between biologically meaningful matches and potentially random alignments. By adding manual verification with InterPro domain annotations, the authors took additional steps to ensure that identified similarities were not only statistically significant but also biologically relevant.

This rigorous validation strategy adds a layer of reliability to HHpred results, demonstrating an effective maximization of sensitivity while maintaining specificity. This approach yielded biologically intriguing and previously undocumented similarities, such as between the Spike-prominin and ORF3a-GPCR, underscoring the quality and depth of the annotation process. These findings highlight a pathway for further experimental validation and illustrate the potential of HHpred to contribute high-quality insights when applied with careful quality control measures.

In summary, the decision to adopt HHpred (Gabler et al. 2020) and enhance its outputs with a robust quality validation process not only improved the depth of SARS-CoV-2 protein annotations but also established a high standard for future viral annotation projects, striking an effective balance between discovery potential and annotation quality​. The authors have conducted a study that is methodologically rigorous, well-detailed, and highly pertinent to the field. This work stands as a significant contribution to the scientific community, providing resources and insights that are likely to guide future research in this area. 

              
References

Brézellec, P (2024) Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virus. bioRxiv, ver. 3 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2023.06.06.543855

Gabler F, Nam S-Z, Till S, Mirdita M, Steinegger M, Söding J, Lupas AN, Alva V (2020) Protein Sequence Analysis Using the MPI Bioinformatics Toolkit. Current Protocols in Bioinformatics, 72, e108. https://doi.org/10.1002/cpbi.108

 

Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virusPierre Brézellec<p>Since the publication of the genome of SARS-CoV-2 – the causative agent of COVID-19 – in January 2020, many bioinformatic tools have been applied to annotate its proteins. Although efficient methods have been used, such as the identification of...Bioinformatics, Evolutionary genomics, Viruses and transposable elementsJitendra Narayan2023-06-08 10:17:04 View
02 Apr 2021
article picture

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection

Toward a critical assessment of virus detection in plants

Recommended by based on reviews by Alexander Suh and 1 anonymous reviewer

The advent of High Throughput Sequencing (HTS) since the last decade has revealed previously unsuspected diversity of viruses as well as their (sometimes) unexpected presence in some healthy individuals. These results demonstrate that genomics offers a powerful tool for studying viruses at the individual level, allowing an in-depth inventory of those that are infecting an organism. Such approaches make it possible to study viromes with an unprecedented level of detail, both qualitative and quantitative, which opens new venues for analyses of viruses of humans, animals and plants. Consequently, the diagnostic field is using more and more HTS, fueling the need for efficient and reliable bioinformatics tools. 

Many such tools have already been developed, but in plant disease diagnostics, validation of the bioinformatics pipelines used for the detection of viruses in HTS datasets is still in its infancy. There is an urgent need for benchmarking the different tools and algorithms using well-designed reference datasets generated for this purpose. This is a crucial step to move forward and to improve existing solutions toward well-standardized bioinformatics protocols. This context has led to the creation of the Plant Health Bioinformatics Network (PHBN), a Euphresco network project aiming to build a bioinformatics community working on plant health. One of their objectives is to provide researchers with open-access reference datasets allowing to compare and validate virus detection pipelines. 

In this framework, Tamisier et al. [1] present real, semi-artificial, and completely artificial datasets, each aimed at addressing challenges that could affect virus detection. These datasets comprise real RNA-seq reads from virus-infected plants as well as simulated virus reads. Such a work, providing open-access datasets for benchmarking bioinformatics tools, should be encouraged as they are key to software improvement as demonstrated by the well-known success story of the protein structure prediction community: their pioneer community-wide effort, called Critical Assessment of protein Structure Prediction (CASP)[2], has been providing research groups since 1994 with an invaluable way to objectively test their structure prediction methods, thereby delivering an independent assessment of state-of-art protein-structure modelling tools. Following this success, many other bioinformatic community developed similar “competitions”, such as RNA-puzzles [3] to predict RNA structures, Critical Assessment of Function Annotation [4] to predict gene functions, Critical Assessment of Prediction of Interactions [5] to predict protein-protein interactions, Assemblathon [6] for genome assembly, etc. These are just a few examples from a long list of successful initiatives. Such efforts enable rigorous assessments of tools, stimulate the developers’ creativity, but also provide user communities with a state-of-art evaluation of available tools.

Inspired by these success stories, the authors propose a “VIROMOCK challenge” [7], asking researchers in the field to test their tools and to provide feedback on each dataset through a repository. This initiative, if well followed, will undoubtedly improve the field of virus detection in plants, but also probably in many other organisms. This will be a major contribution to the field of viruses, leading to better diagnostics and, consequently, a better understanding of viral diseases, thus participating in promoting human, animal and plant health.   

References

[1] Tamisier, L., Haegeman, A., Foucart, Y., Fouillien, N., Al Rwahnih, M., Buzkan, N., Candresse, T., Chiumenti, M., De Jonghe, K., Lefebvre, M., Margaria, P., Reynard, J.-S., Stevens, K., Kutnjak, D. and Massart, S. (2021) Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection. Zenodo, 4273791, version 4 peer-reviewed and recommended by Peer community in Genomics. doi: https://doi.org/10.5281/zenodo.4273791

[2] Critical Assessment of protein Structure Prediction” (CASP) - https://en.wikipedia.org/wiki/CASP

[3] RNA-puzzles - https://www.rnapuzzles.org

[4] Critical Assessment of Function Annotation (CAFA) - https://en.wikipedia.org/wiki/Critical_Assessment_of_Function_Annotation

[5] Critical Assessment of Prediction of Interactions (CAPI) - https://en.wikipedia.org/wiki/Critical_Assessment_of_Prediction_of_Interactions

[6] Assemblathon - https://assemblathon.org

[7] VIROMOCK challenge - https://gitlab.com/ilvo/VIROMOCKchallenge

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detectionLucie Tamisier, Annelies Haegeman, Yoika Foucart, Nicolas Fouillien, Maher Al Rwahnih, Nihal Buzkan, Thierry Candresse, Michela Chiumenti, Kris De Jonghe, Marie Lefebvre, Paolo Margaria, Jean Sébastien Reynard, Kristian Stevens, Denis Kutnjak, Séb...<p>The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics...Bioinformatics, Plants, Viruses and transposable elementsHadi Quesneville2020-11-27 14:31:47 View
26 Feb 2025
article picture

Sequencing, de novo assembly of Ludwigia plastomes, and comparative analysis within the Onagraceae family

Onagre, monster, invasion and genetics

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

The first time I heard of ”onagres” in French was when I was a teenager, through the books of Pierre Bordage as fantastic monsters, or through historical games as Roman siege weapons (onagers). At this time, I was far from imagining that “onagre” also refers to a very large flowering plant family, as it is the French term for evening primroses.

In this family, the genus Ludwigia comprises species that are invasive (resembling in that way the ancient armies using onagers to invade cities) in aquatic environments, degrading ecosystems already fragilized by human activities. To counteract this phenomenon, it is of high importance to understand their propagation of these species. However, the knowledge about their genetics and diversity is very scarce, and thus tracking their dispersal using genetic information is complicated, and in fact almost impossible.

Barloy-Hubler et al. (2024) proposed in the present manuscript a new set of chloroplastic genomes from two of these species, Ludwigia grandiflora subsp. hexapetala and Ludwigia peploides subsp. montevidensis, and compared them to the published chloroplastic genome of Ludwigia octovalis. They explored the possibility of assembling these genomes relying solely on short reads and showed that long reads were necessary to obtain an almost complete assembly for these plastid genomes. In addition, through this approach, they detected two haplotypes in Ludwigia grandiflora subsp. hexapetala as compared to one in a short-read assembly. This highlights the need for long reads data to assess the structure and diversity of chloroplastic genomes. The authors were also able to clarify the phylogeny of the genus Ludwigia. Finally, they identified multiple potential single nucleotide polymorphisms and simple sequence repeats for future evaluation of diversity and dispersal of those invasive species.

This analysis, while appearing more technical than biological at first glance, is in fact of high importance for the understanding of ecology and preservation of fragile ecosystems, such as the European watersheds. Indeed, new scientific results and insights are generally linked to a reevaluation of previously analyzed data or samples through new technologies, and this paper is a quite clever example of that matter.

                                

References

Barloy-Hubler F, Gac A-LL, Boury C, Guichoux E, Barloy D (2024) Sequencing, de novo assembly of Ludwigia plastomes, and comparative analysis within the Onagraceae family. bioRxiv, ver. 5 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2023.10.20.563230

Bordage, P (1993) Les Guerriers du Silence, L'Atalante, ISBN 9782905158697

 

Sequencing, de novo assembly of *Ludwigia* plastomes, and comparative analysis within the Onagraceae familyF Barloy-Hubler, A-L Le Gac, C Boury, E Guichoux, D Barloy<p>The Onagraceae family, which belongs to the order Myrtales, consists of approximately 657 species and 17 genera. This family includes the genus <em>Ludwigia </em>L., which is comprised of 82 species. In this study, we focused on the two aquatic...Bioinformatics, PlantsFrancois Sabot2023-12-12 18:05:20 View
10 Jul 2023
article picture

SNP discovery by exome capture and resequencing in a pea genetic resource collection

The value of a large Pisum SNP dataset

Recommended by based on reviews by Rui Borges and 1 anonymous reviewer

One important goal of modern genetics is to establish functional associations between genotype and phenotype. Single nucleotide polymorphisms (SNPs) are numerous and widely distributed in the genome and can be obtained from nucleic acid sequencing (1). SNPs allow for the investigation of genetic diversity, which is critical for increasing crop resilience to the challenges posed by global climate change. The associations between SNPs and phenotypes can be captured in genome-wide association studies. SNPs can also be used in combination with machine learning, which is becoming more popular for predicting complex phenotypic traits like yield and biotic and abiotic stress tolerance from genotypic data (2). The availability of many SNP datasets is important in machine learning predictions because this approach requires big data to build a comprehensive model of the association between genotype and phenotype.

Aubert and colleagues have studied, as part of the PeaMUST project, the genetic diversity of 240 Pisum accessions (3). They sequenced exome-enriched genomic libraries, a technique that enables the identification of high-density, high-quality SNPs at a low cost (4). This technique involves capturing and sequencing only the exonic regions of the genome, which are the protein-coding regions. A total of 2,285,342 SNPs were obtained in this study. The analysis of these SNPs with the annotations of the genome sequence of one of the studied pea accessions (5) identified a number of SNPs that could have an impact on gene activity. Additional analyses revealed 647,220 SNPs that were unique to individual pea accessions, which might contribute to the fitness and diversity of accessions in different habitats. Phylogenetic and clustering analyses demonstrated that the SNPs could distinguish Pisum germplasms based on their agronomic and evolutionary histories. These results point out the power of selected SNPs as markers for identifying Pisum individuals.

Overall, this study found high-quality SNPs that are meaningful in a biological context. This dataset was derived from a large set of germplasm and is thus particularly useful for studying genotype-phenotype associations, as well as the diversity within Pisum species. These SNPs could also be used in breeding programs to develop new pea varieties that are resilient to abiotic and biotic stressors.  

References


1.         Fallah M, Jean M, Boucher St-Amour VT, O’Donoughue L, Belzile F. The construction of a high-density consensus genetic map for soybean based on SNP markers derived from genotyping-by-sequencing. Genome. 2022 Aug;65(8):413–25.

https://doi.org/10.1139/gen-2021-005


2.         Gill M, Anderson R, Hu H, Bennamoun M, Petereit J, Valliyodan B, et al. Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction. BMC Plant Biology. 2022 Apr 8;22(1):180.

https://doi.org/10.1186/s12870-022-03559-z


3.         Aubert G, Kreplak J, Leveugle M, Duborjal H, Klein A, Boucherot K, et al. SNP discovery by exome capture and resequencing in a pea genetic resource collection., biorxiv, ver. 4, peer-reviewed and recommended by Peer Community in Genomics.

https://doi.org/10.1101/2022.08.03.502586 


4.         Warr A, Robert C, Hume D, Archibald A, Deeb N, Watson M. Exome sequencing: current and future perspectives. G3 Genes|Genomes|Genetics. 2015 Aug 1;5(8):1543–50.

https://doi.org/10.1534/g3.115.018564


5.         Kreplak J, Madoui MA, Cápal P, Novák P, Labadie K, Aubert G, et al. A reference genome for pea provides insight into legume genome evolution. Nat Genet. 2019 Sep;51(9):1411–22.

https://doi.org/10.1038/s41588-019-0480-1

SNP discovery by exome capture and resequencing in a pea genetic resource collectionG. Aubert, J. Kreplak, M. Leveugle, H. Duborjal, A. Klein, K. Boucherot, E. Vieille, M. Chabert-Martinello, C. Cruaud, V. Bourion, I. Lejeune-Hénaut, M.L. Pilet-Nayel, Y. Bouchenak-Khelladi, N. Francillonne, N. Tayeh, J.P. Pichon, N. Rivière, J. B...<p style="text-align: justify;"><strong>Background &amp; Summary</strong></p> <p style="text-align: justify;">In addition to being the model plant used by Mendel to establish genetic laws, pea (<em>Pisum sativum</em> L., 2n=14) is a major pulse c...Plants, Population genomicsWanapinun Nawae2022-11-29 09:29:06 View
08 Nov 2022
article picture

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks

How to best call the somatic mosaic tree?

Recommended by based on reviews by 2 anonymous reviewers

Any multicellular organism is a molecular mosaic with some somatic mutations accumulated between cell lineages. Big long-lived trees have nourished this imaginary of a somatic mosaic tree, from the observation of spectacular phenotypic mosaics and also because somatic mutations are expected to potentially be passed on to gametes in plants (review in Schoen and Schultz 2019). The lower cost of genome sequencing now offers the opportunity to tackle the issue and identify somatic mutations in trees.

However, when it comes to characterizing this somatic mosaic from genome sequences, things become much more difficult than one would think in the first place. What separates cell lineages ontogenetically, in cell division number, or in time? How to sample clonal cell populations? How do somatic mutations distribute in a population of cells in an organ or an organ sample? Should they be fixed heterozygotes in the sample of cells sequenced or be polymorphic? Do we indeed expect somatic mutations to be fixed? How should we identify and count somatic mutations?

To date, the detection of somatic mutations has mostly been done with a single variant caller in a given study, and we have little perspective on how different callers provide similar or different results. Some studies have used standard SNP callers that assumed a somatic mutation is fixed at the heterozygous state in the sample of cells, with an expected allele coverage ratio of 0.5, and less have used cancer callers, designed to detect mutations in a fraction of the cells in the sample. However, standard SNP callers detect mutations that deviate from a balanced allelic coverage, and different cancer callers can have different characteristics that should affect their outcomes.

In order to tackle these issues, Schmitt et al. (2022) conducted an extensive simulation analysis to compare different variant callers. Then, they reanalyzed two large published datasets on pedunculate oak, Quercus robur.  The analysis of in silico somatic mutations allowed the authors to evaluate the performance of different variant callers as a function of the allelic fraction of somatic mutations and the sequencing depth. They found one of the seven callers to provide better and more robust calls for a broad set of allelic fractions and sequencing depths. The reanalysis of published datasets in oaks with the most effective cancer caller of the in silico analysis allowed them to identify numerous low-frequency mutations that were missed in the original studies.

I recommend the study of Schmitt et al. (2022) first because it shows the benefit of using cancer callers in the study of somatic mutations, whatever the allelic fraction you are interested in at the end. You can select fixed heterozygotes if this is your ultimate target, but cancer callers allow you to have in addition a valuable overview of the allelic fractions of somatic mutations in your sample, and most do as well as SNP callers for fixed heterozygous mutations. In addition, Schmitt et al. (2022) provide the pipelines that allow investigating in silico data that should correspond to a given study design, encouraging to compare different variant callers rather than arbitrarily going with only one. We can anticipate that the study of somatic mutations in non-model species will increasingly attract attention now that multiple tissues of the same individual can be sequenced at low cost, and the study of Schmitt et al. (2022) paves the way for questioning and choosing the best variant caller for the question one wants to address.

References

Schoen DJ, Schultz ST (2019) Somatic Mutation and Evolution in Plants. Annual Review of Ecology, Evolution, and Systematics, 50, 49–73. https://doi.org/10.1146/annurev-ecolsys-110218-024955

Schmitt S, Leroy T, Heuertz M, Tysklind N (2022) Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks. bioRxiv, 2021.10.11.462798. ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2021.10.11.462798

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaksSylvain Schmitt, Thibault Leroy, Myriam Heuertz, Niklas Tysklind<p style="text-align: justify;">1. Mutation, the source of genetic diversity, is the raw material of evolution; however, the mutation process remains understudied, especially in plants. Using both a simulation and reanalysis framework, we set out ...Bioinformatics, PlantsNicolas BierneAnonymous, Anonymous2022-04-28 13:24:19 View
22 Jan 2025
article picture

Spatio-temporal diversity and genetic architecture of pyrantel resistance in Cylicocyclus nassatus, the most abundant horse parasite

Genomic and transcriptomic insights into the genetic basis of anthelmintic resistance in a cyathostomin parasitic nematode

Recommended by based on reviews by 2 anonymous reviewers

Parasitic worms infect billions of animals worldwide. While parasitism is now considered a context-dependent relation along a symbiosis continuum, most of these parasitic worms, also known as helminths, can cause diseases that have a significant impact (Hopkins et al. 2017; Selzer, Epe 2021). When considering livestock animals, these impacts have a high economic cost, and therefore, prophylactic drugs are widely used (Selzer and Epe 2021). Consequently, drug resistance has become increasingly common across all parasites and concerns about drug effects on non-target organisms have been raised (de Souza and Guimarães 2022). This is why understanding the relationship between parasitic worms and their animal hosts and the diseases they cause at the genetic and molecular level is high on the agenda of parasitologists (Doyle 2022). The development of genomics resources plays a pivotal role in this agenda and is at the origin of Sallé and colleagues' article (2025).

The most common intestinal parasites in equids are helminths of the cyathostomin nematode complex. These are the primary parasitic cause of death in young horses and also exhibit a reduced sensitivity to anthelmintic drugs. Therefore, Sallé and colleagues embarked on the arduous journey to build a reference annotated genome of the Cylicocylus nassatus nematode. They used cutting-edge molecular genetics methods to amplify and sequence the genome of a single individual and obtained chromosomal-level contiguity using Hi-C technology for six chromosomes and an assembly of 514.7 Mbp. Remarkably, transposable elements occupy more than half of the C. nassatus genome and may have led to an increase in genome size in this nematode. In parallel, the authors built a gene catalogue using transcriptomic data, reaching a BUSCO gene completion score of 94.1% with 22,718 protein-coding genes. They quantified allele frequencies based on the resequencing of nine populations, including an ancient Egyptian worm from the 19th century, indicating a recent loss of genetic diversity in European cyathostomin even if geographical sampling  was limited. They also analysed transcriptomic differences between sexes and found differences linked with drug treatment. While there may be confounding effects due to global differences between sex that could explain this finding, these results will likely fuel future transcriptomic analyses investigating the response to antiparasitic drugs.

The Cylicocylus nassatus genome assembly obtained will be invaluable for studying nematode genome evolution and analysing the genetic and molecular basis of drug resistance in these parasites. 

             

References

Doyle SR (2022) Improving helminth genome resources in the post-genomic era. Trends in Parasitology, 38, 831–840. https://doi.org/10.1016/j.pt.2022.06.002

Hopkins SR, Wojdak JM, Belden LK (2017) Defensive symbionts mediate host–parasite interactions at multiple scales. Trends in Parasitology, 33, 53–64. https://doi.org/10.1016/j.pt.2016.10.003

Sallé G, Courtot É, Cabau C, Parrinello H, Serreau D, Reigner F, Gesbert A, Jacquinot L, Lenhof O, Aimé A, Picandet V, Kuzmina T, Holovachov O, Bellaw J, Nielsen MK, Samson-Himmelstjerna G von, Valière S, Gislard M, Lluch J, Kuchly C, Klopp C (2024) Spatio-temporal diversity and genetic architecture of pyrantel resistance in Cylicocyclus nassatus, the most abundant horse parasite. bioRxiv, ver. 2 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2023.07.19.549683

Selzer PM, Epe C (2021) Antiparasitics in animal health: quo vadis? Trends in Parasitology, 37, 77–89. https://doi.org/10.1016/j.pt.2020.09.004

de Souza RB, Guimarães JR (2022) Effects of avermectins on the environment based on its toxicity to plants and soil invertebrates–a review. Water, Air, and Soil Pollution, 233, 259. https://doi.org/10.1007/s11270-022-05744-0

 

Spatio-temporal diversity and genetic architecture of pyrantel resistance in *Cylicocyclus nassatus*, the most abundant horse parasiteGuillaume Sallé, Élise Courtot, Cédric Cabau, Hugues Parrinello, Delphine Serreau, Fabrice Reigner, Amandine Gesbert, Lauriane Jacquinot, Océane Lenhof, Annabelle Aimé, Valérie Picandet, Tetiana Kuzmina, Oleksandr Holovachov, Jennifer Bellaw, Mart...<p>Cyathostomins are a complex of 50 intestinal parasite species infecting horses and wild equids. The massive administration of modern anthelmintic drugs has increased their relative abundance in horse helminth communities and selected drug-resis...Terrestrial invertebratesNicolas Pollet Jane Hodgkinson, Anonymous2023-07-27 20:45:09 View
03 Jul 2024
article picture

T7 DNA polymerase treatment improves quantitative sequencing of both double-stranded and single-stranded DNA viruses

Improving the sequencing of single-stranded DNA viruses: Another brick for building Earth's complete virome encyclopedia

Recommended by ORCID_LOGO based on reviews by Philippe Roumagnac and 3 anonymous reviewers

The wide adoption of high-throughput sequencing technologies has uncovered an astonishing diversity of viruses in most biosphere habitats. Among them, single-stranded DNA viruses are prevalent, infecting diverse hosts from all three domains of life (Malathi et al. 2014) with some species being highly pathogenic to animals or plants.

Sequencing of single-stranded DNA viruses requires a specific approach that usually leads to their over-representation compared to double-stranded DNA. The article from Billaud et al. (2024) addresses this challenge. It presents a novel and efficient method for converting single-stranded DNA to double-stranded DNA using T7 DNA polymerase before high-throughput virome sequencing. It compares this new method with the Phi29 polymerase method, demonstrating its advantages in the representation and accuracy of viral DNA content in well-defined synthetic phage mixtures and complex human virome samples from the stool. This T7 DNA polymerase treatment significantly improved the richness and abundance of the Microviridae fraction in their samples, suggesting a more comprehensive representation of viral diversity.

The article presents a compelling case for testing and adopting the T7 DNA polymerase methodology in preparing virome samples for shotgun sequencing. This novel approach, supported by comparative analysis with existing methodologies, represents a valuable contribution to metagenomics for characterizing virome diversity.

                       

References

Billaud M, Theodorou I, Lamy-Besnier Q, Shah SA, Lecointe F, Sordi LD, Paepe MD, Petit M-A (2024) T7 DNA polymerase treatment improves quantitative sequencing of both double-stranded and single-stranded DNA viruses. bioRxiv, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.12.12.520144

Malathi VG, Renuka Devi P. (2019) ssDNA viruses: key players in global virome. Virus disease. 30: 3–12. https://doi.org/10.1007/s13337-019-00519-4 

 

T7 DNA polymerase treatment improves quantitative sequencing of both double-stranded and single-stranded DNA virusesMaud Billaud, Ilias Theodorou, Quentin Lamy-Besnier, Shiraz Shah, François Lecointe, Luisa De Sordi, Marianne De Paepe, Marie-Agnès Petit<p>Background: Bulk microbiome, as well as virome-enriched shotgun sequencing only reveals the double-stranded DNA (dsDNA) content of a given sample, unless specific treatments are applied. However, genomes of viruses often consist of a circular s...Viruses and transposable elementsSebastien Massart2023-12-20 16:50:00 View
07 Sep 2023
article picture

The demographic history of the wild crop relative Brachypodium distachyon is shaped by distinct past and present ecological niches

Natural variation and adaptation in Brachypodium distachyon

Recommended by based on reviews by Thibault Leroy and 1 anonymous reviewer

Identifying the genetic factors that allow plant adaptation is a major scientific question that is particularly relevant in the face of the climate change that we are already experiencing. To address this, it is essential to have genetic information on a high number of accessions (i.e., plants registered with unique accession numbers) growing under contrasting environmental conditions. There is already an important number of studies addressing these issues in the plant Arabidopsis thaliana, but there is a need to expand these analyses to species that play key roles in wild ecosystems and are close to very relevant crops, as is the case of grasses.

The work of Minadakis, Roulin and co-workers (1) presents a Brachypodium distachyon panel of 332 fully sequences accessions that covers the whole species distribution across a wide range of bioclimatic conditions, which will be an invaluable tool to fill this gap. In addition, the authors use this data to start analyzing the population structure and demographic history of this plant, suggesting that the species experienced a shift of its distribution following the Last Glacial Maximum, which may have forced the species into new habitats. The authors also present a modeling of the niches occupied by B. distachyon together with an analysis of the genetic clades found in each of them, and start analyzing the different adaptive loci that may have allowed the species’ expansion into different bioclimatic areas.

In addition to the importance of the resources made available by the authors for the scientific community, the analyses presented are well done and carefully discussed, and they highlight the potential of these new resources to investigate the genetic bases of plant adaptation. 

References

1. Nikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin. The demographic history of the wild crop relative Brachypodium distachyon is shaped by distinct past and present ecological niches. bioRxiv, 2023.06.01.543285, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2023.06.01.543285

The demographic history of the wild crop relative *Brachypodium distachyon* is shaped by distinct past and present ecological nichesNikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin<p style="text-align: justify;">Closely related to economically important crops, the grass <em>Brachypodium distachyon</em> has been originally established as a pivotal species for grass genomics but more recently flourished as a model for develop...Evolutionary genomics, Functional genomics, Plants, Population genomicsJosep Casacuberta2023-06-14 15:28:30 View
11 May 2024
article picture

The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics

Informed Choices, Cohesive Future: Decisions and Recommendations for ERGA

Recommended by ORCID_LOGO based on reviews by Justin Ideozu and Eric Crandall

The European Reference Genome Atlas (ERGA) (Mc Cartney et al, 2024, Mazzoni et al, 2023) demonstrates the collaborative spirit and intellectual abilities of researchers from 33 European countries. This ambitious project, which is part of the Earth BioGenome Project (Lewin et al., 2018) Phase II, has embarked on an unprecedented mission: to decipher the genetic makeup of 150,000 species over a span of four years. At the heart of ERGA is a decentralized pilot infrastructure specifically built to assist the production of high-quality reference genomes. This structure acts as a scaffold for the massive task of genome sequencing, giving the necessary framework to manage the complexity of genomic research. The research paper under consideration offers a comprehensive narrative of ERGA's evolution, outlining both successes and challenges encountered along the road. 

One of the most significant issues addressed in the manuscript is the equitable distribution of resources and expertise among participating laboratories and countries. In a project of this magnitude, it is critical to leverage the pooled talents and capacities of researchers from across Europe. ERGA's pan-European network promotes communications and collaboration, creating an environment in which knowledge flows freely and barriers are overcome. This adoption of strong coordination and communication tactics will be essential to ERGA's success. Scientific collaboration depends on efficient communication channels because they allow researchers to share resources, collaborate on new initiatives, and exchange ideas. Through a diverse range of gatherings, courses, and virtual discussion boards, ERGA fosters an environment of transparency and cooperation among members, enabling scientists to overcome challenges and make significant discoveries. The importance ERGA places on training and information transfer programmes is a pillar of its strategy. Understanding the importance of capacity development, ERGA invests in providing researchers with the knowledge and abilities necessary for effectively navigating the complicated terrain of genomic research. A wide range of subjects are covered in training programmes (Larivière et al. 2023), from sample preparation and collection to data processing methods and sequencing technology. Through the development of a group of highly qualified experts, ERGA creates the foundation for continued advancement and creativity in the genomics sector.

This manuscript also covers in detail the technological workflows and sequencing techniques used in ERGA's pilot infrastructure. With the aid of cutting-edge sequencing technologies based on both long-read and short-read sequencing, they are working to unravel the complex structure of the genetic code with a level of accuracy and precision never before possible. To guarantee the accuracy of genetic data and prevent mistakes and flaws that can jeopardize the findings' integrity, quality control methods are put in place. Despite having a focus on genome sequencing due to its technological complexities, ERGA also remains firm in its dedication to metadata collection and sample validation. Metadata serves as a critical link between raw genetic data and useful scientific insights, giving necessary context and allowing researchers to draw practical findings from their investigations. Sample validation approaches improve the reliability and reproducibility of the results, providing users confidence in the quality of the genetic data provided by ERGA.​

Looking ahead, ERGA envisions its decentralized infrastructure serving as a model for global collaborative research efforts. By embracing diversity, encouraging cooperation, and pushing for open access to data and resources, ERGA hopes to catalyze scientific discovery and generate positive change in the field of biodiversity genomics. ERGA aims to promote a more equitable and sustainable future for all by ongoing interaction with stakeholders, intensive outreach and education activities, and policy change advocacy. In addition to its immediate goals, ERGA considers the long-term implications of its work. As genomic technology progresses, the potential application of high-quality reference genomes will continue to grow. From informing conservation efforts and illuminating evolutionary histories to revolutionizing healthcare and agriculture, it is likely that ERGA's contributions will have far-reaching consequences for people and the planet as a whole.​

Furthermore, ERGA understands the importance of interdisciplinary collaboration in addressing the difficult challenges of the twenty-first century. ERGA aims to integrate genetic research into larger initiatives to promote sustainability and biodiversity conservation by forming relationships with stakeholders from other areas, such as policymakers, conservationists, and indigenous groups. Through shared knowledge and community action, ERGA seeks to create a future in which mankind coexists peacefully with the natural world, guided by a thorough grasp of its genetic legacy and ecological interconnectivity.

Finally, the manuscript exemplifies ERGA's collaborative ambitions and achievements, capturing the spirit of creativity and collaboration that defines this ground-breaking effort. As ERGA continues to push the boundaries of genetic research, it remains dedicated to scientific excellence, inclusivity, and the quest of knowledge for the benefit of society. I wholeheartedly recommend the publication of this groundbreaking initiative, offering my enthusiastic endorsement for its valuable contribution to the scientific community.​​

References
Larivière, D., Abueg, L., Brajuka, N. et al. (2024). Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature Biotechnology 42, 367-370. https://doi.org/10.1038/s41587-023-02100-3

Lewin, H. A., Robinson, G. E., Kress, W. J., Baker, W. J., Coddington, J., Crandall, K. A., Durbin, R., Edwards, S. V., Forest, F., Gilbert, M. T. P., Goldstein, M. M., Grigoriev, I. V., Hackett, K. J., Haussler, D., Jarvis, E. D., Johnson, W. E., Patrinos, A., Richards, S., Castilla-Rubio, J. C., … Zhang, G. (2018). Earth BioGenome Project: Sequencing life for the future of life. Proceedings of the National Academy of Sciences, 115(17), 4325–4333. https://doi.org/10.1073/pnas.1720115115

Mazzoni, C. J., Claudio, C.i, Waterhouse, R. M. (2023). Biodiversity: an atlas of European reference genomes. Nature 619 : 252-252. https://doi.org/10.1038/d41586-023-02229-w

Mc Cartney, A. M., Formenti, G., Mouton, A., Panis, D. de, Marins, L. S., Leitão, H. G., Diedericks, G., Kirangwa, J., Morselli, M., Salces-Ortiz, J., Escudero, N., Iannucci, A., Natali, C., Svardal, H., Fernández, R., Pooter, T. de, Joris, G., Strazisar, M., Wood, J., … Mazzoni, C. J. (2024). The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics. bioRxiv, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2023.09.25.559365

The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomicsAnn M Mc Cartney, Giulio Formenti, Alice Mouton, Claudio Ciofi, Robert M Waterhouse, Camila J Mazzoni, Diego De Panis, Luisa S Schlude Marins, Henrique G Leitao, Genevieve Diedericks, Joseph Kirangwa, Marco Morselli, Judit Salces, Nuria Escudero, ...<p>English: A global genome database of all of Earth's species diversity could be a treasure trove of scientific discoveries. However, regardless of the major advances in genome sequencing technologies, only a tiny fraction of species have genomic...Bioinformatics, ERGA PilotJitendra Narayan Justin Ideozu, Eric Crandall2023-10-01 01:03:58 View