Submit a preprint

Latest recommendationsrsstwitter

IdTitle * Authors * Abstract * Picture * Thematic fields * RecommenderReviewersSubmission date
13 Jul 2022
article picture

Nucleosome patterns in four plant pathogenic fungi with contrasted genome structures

Genome-wide chromatin and expression datasets of various pathogenic ascomycetes

Recommended by and based on reviews by Ricardo C. Rodríguez de la Vega and 1 anonymous reviewer

Plant pathogenic fungi represent serious economic threats. These organisms are rapidly adaptable, with plastic genomes containing many variable regions and evolving rapidly. It is, therefore, useful to characterize their genetic regulation in order to improve their control. One of the steps to do this is to obtain omics data that link their DNA structure and gene expression. 
In this paper, Clairet et al. (2022) studied the nucleosome positioning and gene expression of four plant pathogenic ascomycete species (Leptosphaeria maculans, Leptosphaeria maculans 'lepidii', Fusarium graminearum, Botrytis cinerea). The genomes of these species contain different compositions of transposable elements (from 4 to 30%), and present an equally variable compartmentalization. The authors established MNAse-seq and RNA-seq maps of these genomes in axenic cultures. Thanks to an ad-hoc tool allowing the visualization of MNA-seq data in combination with other "omics" data, they were able to compare the maps of the different species between them and to study different types of correlation. This tool, called MSTS for "MNase-Seq Tool Suite", allows for example to perform limited analyses on certain genetic subsets in an ergonomic way. 
In the fungi studied, nucleosomes are positioned every 161 to 172 bp, with intra-genome variations such as AT-rich regions but, surprisingly, particularly dense nucleosomes in the Lmb genome. The authors discuss the differences between these organisms with respect to this nucleosome density, the expression profile, and the structure and transposon composition of the different genomes. These data and insights thus represent interesting resources for researchers interested in the evolution of ascomycete genomes and their adaptation. For this, and for the development of the MSTS tool, we recommend this preprint.


Clairet C, Lapalu N, Simon A, Soyer JL, Viaud M, Zehraoui E, Dalmais B, Fudal I, Ponts N (2022) Nucleosome patterns in four plant pathogenic fungi with contrasted genome structures. bioRxiv, 2021.04.16.439968, ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

Nucleosome patterns in four plant pathogenic fungi with contrasted genome structuresColin Clairet, Nicolas Lapalu, Adeline Simon, Jessica L. Soyer, Muriel Viaud, Enric Zehraoui, Berengere Dalmais, Isabelle Fudal, Nadia Ponts<p style="text-align: justify;">Fungal pathogens represent a serious threat towards agriculture, health, and environment. Control of fungal diseases on crops necessitates a global understanding of fungal pathogenicity determinants and their expres...Epigenomics, FungiSébastien Bloyer2021-04-17 10:32:41 View
24 Feb 2023
article picture

Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study

How to interpret the inference of recombination landscapes on methods based on linkage disequilibrium?

Recommended by based on reviews by 2 anonymous reviewers

Data interpretation depends on previously established and validated tools, designed for a specific type of data. These methods, however, are usually based on simple models with validity subject to a set of theoretical parameterized conditions and data types. Accordingly, the tool developers provide the potential users with guidelines for data interpretations within the tools’ limitation. Nevertheless, once the methodology is accepted by the community, it is employed in a large variety of empirical studies outside of the method’s original scope or that typically depart from the standard models used for its design, thus potentially leading to the wrong interpretation of the results.

Numerous empirical studies inferred recombination rates across genomes, detecting hotspots of recombination and comparing related species (e.g., Shanfelter et al. 2019, Spence and Song 2019). These studies used indirect methodologies based on the signals that recombination left in the genome, such as linkage disequilibrium and the patterns of haplotype segregation (e.g.,Chan et al. 2012). The conclusions from these analyses have been used, for example, to interpret the evolution of the chromosomal structure or the evolution of recombination among closely related species.

Indirect methods have the advantage of collecting a large quantity of recombination events, and thus have a better resolution than direct methods (which only detect the few recombination events occurring at that time). On the other hand, indirect methods are affected by many different evolutionary events, such as demographic changes and selection. Indeed, the inference of recombination levels across the genome has not been studied accurately in non-standard conditions. Linkage disequilibrium is affected by several factors that can modify the recombination inference, such as demographic history, events of selection, population size, and mutation rate, but is also related to the size of the studied sample, and other technical parameters defined for each specific methodology.

Raynaud et al (2023) analyzed the reliability of the recombination rate inference when considering the violation of several standard assumptions (evolutionary and methodological) in one of the most popular families of methods based on LDhat (McVean et al. 2004), specifically its improved version, LDhelmet (Chan et al. 2012). These methods cover around 70 % of the studies that infer recombination rates. The authors used recombination maps, obtained from empirical studies on humans, and included hotspots, to perform a detailed simulation study of the capacity of this methodology to correctly infer the pattern of recombination and the location of these hotspots. Correlations between the real, and inferred values from simulations were obtained, as well as several rates, such as the true positive and false discovery rate to detect hotspots.

The authors of this work send a message of caution to researchers that are applying this methodology to interpret data from the inference of recombination landscapes and the location of hotspots. The inference of recombination landscapes and hotspots can differ considerably even in standard model conditions. In addition, demographic processes, like bottleneck or admixture, but also the level of population size and mutation rates, can substantially affect the estimation accuracy of the level of recombination and the location of hotspots. Indeed, the inference of the location of hotspots in simulated data with the same landscape, can be very imprecise when standard assumptions are violated or not considered. These effects may lead to incorrect interpretations, for example about the conservation of recombination maps between closely related species. Finally, Raynaud et al (2023) included a useful guide with advice on how to obtain accurate recombination estimations with methods based on linkage disequilibrium, also emphasizing the limitations of such approaches.


Chan AH, Jenkins PA, Song YS (2012) Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster. PLOS Genetics, 8, e1003090.

McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The Fine-Scale Structure of Recombination Rate Variation in the Human Genome. Science, 304, 581–584.

Raynaud M, Gagnaire P-A, Galtier N (2023) Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study. bioRxiv, 2022.03.30.486352, ver. 2 peer-reviewed and recommended by Peer Community in Genomics.

Spence JP, Song YS (2019) Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Science Advances, 5, eaaw9206.

Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation studyMarie Raynaud, Pierre-Alexandre Gagnaire, Nicolas Galtier<p style="text-align: justify;">Knowledge of recombination rate variation along the genome provides important insights into genome and phenotypic evolution. Population genomic approaches offer an attractive way to infer the population-scaled recom...Bioinformatics, Evolutionary genomics, Population genomicsSebastian E. Ramos-Onsins2022-04-05 14:59:14 View
25 Nov 2022
article picture

Phenotypic and transcriptomic analyses reveal major differences between apple and pear scab nonhost resistance

Apples and pears: two closely related species with differences in scab nonhost resistance

Recommended by based on reviews by 3 anonymous reviewers

Nonhost resistance is a common form of disease resistance exhibited by plants against microorganisms that are pathogenic to other plant species [1]. Apples and pears are two closely related species belonging to Rosaceae family, both affected by scab disease caused by fungal pathogens in the Venturia genus. These pathogens appear to be highly host-specific. While apples are nonhosts for Venturia pyrina, pears are nonhosts for Venturia inaequalis. To date, the molecular bases of scab nonhost resistance in apple and pear have not been elucidated.

This preprint by Vergne, et al (2022) [2] analyzed nonhost resistance symptoms in apple/V. pyrina and pear/V. inaequalis interactions as well as their transcriptomic responses. Interestingly, the author demonstrated that the nonhost apple/V. pyrina interaction was almost symptomless while hypersensitive reactions were observed for pear/V. inaequalis interaction. The transcriptomic analyses also revealed a number of differentially expressed genes (DEGs) that corresponded to the severity of the interactions, with very few DEGs observed during the apple/V. pyrina interaction and a much higher number of DEGs during the pear/V. inaequalis interaction.

This type of reciprocal host-pathogen interaction study is valuable in gaining new insights into how plants interact with microorganisms that are potential pathogens in related species. A few processes appeared to be involved in the pear resistance against the nonhost pathogen V. inaequalis at the transcriptomic level, such as stomata closure, modification of cell wall and production of secondary metabolites as well as phenylpropanoids. Based on the transcriptomics changes during the nonhost interaction, the author compared the responses to those of host-pathogen interactions and revealed some interesting findings. They proposed a series of cascading effects in pear induced by the presence of V. inaequalis, which I believe helps shed some light on the basic mechanism for nonhost resistance.

I am recommending this study because it provides valuable information that will strengthen our understanding of nonhost resistance in the Rosaceae family and other plant species. The knowledge gained here may be applied to genetically engineer plants for a broader resistance against a number of pathogens in the future.​


1. Senthil-Kumar M, Mysore KS (2013) Nonhost Resistance Against Bacterial Pathogens: Retrospectives and Prospects. Annual Review of Phytopathology, 51, 407–427.

2. Vergne E, Chevreau E, Ravon E, Gaillard S, Pelletier S, Bahut M, Perchepied L (2022) Phenotypic and transcriptomic analyses reveal major differences between apple and pear scab nonhost resistance. bioRxiv, 2021.06.01.446506, ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

Phenotypic and transcriptomic analyses reveal major differences between apple and pear scab nonhost resistanceE. Vergne, E. Chevreau, E. Ravon, S. Gaillard, S. Pelletier, M. Bahut, L. Perchepied<p style="text-align: justify;"><strong>Background. </strong>Nonhost resistance is the outcome of most plant/pathogen interactions, but it has rarely been described in Rosaceous fruit species. Apple (<em>Malus x domestica</em> Borkh.) have a nonho...Functional genomics, PlantsWirulda Pootakham Jessica Soyer, Anonymous2022-05-13 15:06:08 View
08 Apr 2022
article picture

Phylogenetics in the Genomic Era

“Phylogenetics in the Genomic Era” brings together experts in the field to present a comprehensive synthesis

Recommended by and

E-book: Phylogenetics in the Genomic Era (Scornavacca et al. 2021)

This book was not peer-reviewed by PCI Genomics. It has undergone an internal review by the editors.

Accurate reconstructions of the relationships amongst species and the genes encoded in their genomes are an essential foundation for almost all evolutionary inferences emerging from downstream analyses. Molecular phylogenetics has developed as a field over many decades to build suites of models and methods to reconstruct reliable trees that explain, support, or refute such inferences. The genomic era has brought new challenges and opportunities to the field, opening up new areas of research and algorithm development to take advantage of the accumulating large-scale data. Such ‘big-data’ phylogenetics has come to be known as phylogenomics, which broadly aims to connect molecular and evolutionary biology research to address questions centred on relationships amongst taxa, mechanisms of molecular evolution, and the biological functions of genes and other genomic elements. This book brings together experts in the field to present a comprehensive synthesis of Phylogenetics in the Genomic Era, covering key conceptual and methodological aspects of how to build accurate phylogenies and how to apply them in molecular and evolutionary research. The paragraphs below briefly summarise the five constituent parts of the book, highlighting the key concepts, methods, and applications that each part addresses. Being organised in an accessible style, while presenting details to provide depth where necessary, and including guides describing real-world examples of major phylogenomic tools, this collection represents an invaluable resource, particularly for students and newcomers to the field of phylogenomics.

Part 1: Phylogenetic analyses in the genomic era

Modelling how sequences evolve is a fundamental cornerstone of phylogenetic reconstructions. This part of the book introduces the reader to phylogenetic inference methods and algorithmic optimisations in the contexts of Markov, Maximum Likelihood, and Bayesian models of sequence evolution. The main concepts and theoretical considerations are mapped out for probabilistic Markov models, efficient tree building with Maximum Likelihood methods, and the flexibility and robustness of Bayesian approaches. These are supported with practical examples of phylogenomic applications using the popular tools RAxML and PhyloBayes. By considering theoretical, algorithmic, and practical aspects, these chapters provide readers with a holistic overview of the challenges and recent advances in developing scalable phylogenetic analyses in the genomic era.

Part 2: Data quality, model adequacy

This part focuses on the importance of considering the appropriateness of the evolutionary models used and the accuracy of the underlying molecular and genomic data. Both these aspects can profoundly affect the results when applying current phylogenomic methods to make inferences about complex biological and evolutionary processes. A clear example is presented for methods for building multiple sequence alignments and subsequent filtering approaches that can greatly impact phylogeny inference. The importance of error detection in (meta)barcode sequencing data is also highlighted, with solutions offered by the MACSE_BARCODE pipeline for accurate taxonomic assignments. Orthology datasets are essential markers for phylogenomic inferences, but the overview of concepts and methods presented shows that they too face challenges with respect to model selection and data quality. Finally, an innovative approach using ancestral gene order reconstructions provides new perspectives on how to assess gene tree accuracy for phylogenomic analyses. By emphasising through examples the importance of using appropriate evolutionary models and assessing input data quality, these chapters alert readers to key limitations that the field as a whole strives to address.

Part 3: Resolving phylogenomic conflicts

Conflicting phylogenetic signals are commonplace and may derive from statistical or systematic bias. This part of the book addresses possible causes of conflict, discordance between gene trees and species trees and how processes that lead to such conflicts can be described by phylogenetic models. Furthermore, it provides an overview of various models and methods with examples in phylogenomics including their pros and cons. Outlined in detail is the multispecies coalescent model (MSC) and its applications in phylogenomics. An interesting aspect is that different phylogenetic signals leading to conflict are in fact a key source of information rather than a problem that can – and should – be used to point to events like introgression or hybridisation, highlighting possible future trends in this research area. Last but not least, this part of the book also addresses inferring species trees by concatenating single multiple sequence alignments (gene alignments) versus inferring the species tree based on ensembles of single gene trees pointing out advantages and disadvantages of both approaches. As an important take home message from these chapters, it is recommended to be flexible and identify the most appropriate approach for each dataset to be analysed since this may tremendously differ depending on the dataset, setting, taxa, and phylogenetic level addressed by the researcher.

Part 4: Functional evolutionary genomics

In this part of the book the focus shifts to functional considerations of phylogenomics approaches both in terms of molecular evolution and adaptation and with respect to gene expression. The utility of multi-species analysis is clearly presented in the context of annotating functional genomic elements through quantifying evolutionary constraint and protein-coding potential. An historical perspective on characterising rates of change highlights how phylogenomic datasets help to understand the modes of molecular evolution across the genome, over time, and between lineages. These are contextualised with respect to the specific aim of detecting signatures of adaptation from protein-coding DNA alignments using the example of the MutSelDP-ω∗ model. This is extended with the presentation of the generally rare case of adaptive sequence convergence, where consideration of appropriate models and knowledge of gene functions and phenotypic effects are needed. Constrained or relaxed, selection pressures on sequence or copy-number affect genomic elements in different ways, making the very concept of function difficult to pin down despite it being fundamental to relate the genome to the phenotype and organismal fitness. Here gene expression provides a measurable intermediate, for which the Expression Comparison tool from the Bgee suite allows exploration of expression patterns across multiple animal species taking into account anatomical homology. Overall, phylogenomics applications in functional evolutionary genomics build on a rich theoretical history from molecular analyses where integration with knowledge of gene functions is challenging but critical.

Part 5: Phylogenomic applications

Rather than attempting to review the full extent of applications linked to phylogenomics, this part of the book focuses on providing detailed specific insights into selected examples and methods concerning i) estimating divergence times, and ii) species delimitation in the era of ‘omics’ data. With respect to estimating divergence times, an exemplary overview is provided for fossil data recovered from geological records, either using fossil data as calibration points with an extant-species-inferred phylogeny, or using a fossilised birth-death process as a mechanistic model that accounts for lineage diversification. Included is a tutorial for a joint approach to infer phylogenies and estimate divergence times using the RevBayes software with various models implemented for different applications and datasets incorporating molecular and morphological data. An interesting excursion is outlined focusing on timescale estimates with respect to viral evolution introducing BEAGLE, a high-performance likelihood-calculation platform that can be used on multi-core systems. As a second major subject, species delimitation is addressed since currently the increasing amount of available genomic data enables extensive inferences, for instance about the degree of genetic isolation among species and ancient and recent introgression events. Describing the history of molecular species delimitation up to the current genomic era and presenting widely used computational methods incorporating single- and multi-locus genomic data, pros and cons are addressed. Finally, a proposal for a new method for delimiting species based on empirical criteria is outlined. In the closing chapter of this part of the book, BPP (Bayesian Markov chain Monte Carlo program) for analysing multi-locus sequence data under the multispecies coalescent (MSC) model with and without introgression is introduced, including a tutorial. These examples together provide accessible details on key conceptual and methodological aspects related to the application of phylogenetics in the genomic era.


Scornavacca C, Delsuc F, Galtier N (2021) Phylogenetics in the Genomic Era.

Phylogenetics in the Genomic EraCéline Scornavacca, Frédéric Delsuc, Nicolas Galtier<p style="text-align: justify;">Molecular phylogenetics was born in the middle of the 20th century, when the advent of protein and DNA sequencing offered a novel way to study the evolutionary relationships between living organisms. The first 50 ye...Bacteria and archaea, Bioinformatics, Evolutionary genomics, Functional genomics, Fungi, Plants, Population genomics, Vertebrates, Viruses and transposable elementsRobert Waterhouse2022-03-15 17:43:52 View
07 Feb 2023
article picture

RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes

A workflow for studying enigmatic non-autonomous transposable elements across bacteria

Recommended by ORCID_LOGO based on reviews by Sophie Abby and 1 anonymous reviewer

Repetitive extragenic palindromic sequences (REPs) are common repetitive elements in bacterial genomes (Gilson et al., 1984; Stern et al., 1984). In 2011, Bertels and Rainey identified that REPs are overrepresented in pairs of inverted repeats, which likely form hairpin structures, that they referred to as “REP doublets forming hairpins” (REPINs). Based on bioinformatics analyses, they argued that REPINs are likely selfish elements that evolved from REPs flanking particular transposes (Bertels and Rainey, 2011). These transposases, so-called REP-associated tyrosine transposases (RAYTs), were known to be highly associated with the REP content in a genome and to have characteristic upstream and downstream flanking REPs (Nunvar et al., 2010). The flanking REPs likely enable RAYT transposition, and their horizontal replication is physically linked to this process. In contrast, Bertels and Rainey hypothesized that REPINs are selfish elements that are highly replicated due to the similarity in arrangement to these RAYT-flanking REPs, but independent of RAYT transposition and generally with no impact on bacterial fitness (Bertels and Rainey, 2011).

This last point was especially contentious, as REPINs are highly conserved within species (Bertels and Rainey, 2023), which is unusual for non-beneficial bacterial DNA (Mira et al., 2001). Bertels and Rainey have since refined their argument to be that REPINs must provide benefits to host cells, but that there are nonetheless signatures of intragenomic conflict in genomes associated with these elements (Bertels and Rainey, 2023). These signatures reflect the divergent levels of selections driving REPIN distribution: selection at the level of each DNA element and selection on each individual bacterium. I found this observation particularly interesting as I and my colleague recently argued that these divergent levels of selection, and the interaction between them, is key to understanding bacterial pangenome diversity (Douglas and Shapiro, 2021). REPINs could be an excellent system for investigating these levels of selection across bacteria more generally.

The problem is that REPINs have not been widely characterized in bacterial genomes, partially because no bioinformatic workflow has been available for this purpose. To address this problem, Fortmann-Grote et al. (2023) developed RAREFAN, which is a web server for identifying RAYTs and associated REPINs in a set of input genomes. The authors showcase their tool by applying it to 49 Stenotrophomonas maltophilia genomes and providing examples of how to identify and assess RAYT-REPIN hits. The workflow requires several manual steps, but nonetheless represents a straightforward and standardized approach. Overall, this workflow should enable RAYTs and REPINs to be identified across diverse bacterial species, which will facilitate further investigation into the mechanisms driving their maintenance and spread.


Bertels F, Rainey PB (2023) Ancient Darwinian replicators nested within eubacterial genomes. BioEssays, 45, 2200085.

Bertels F, Rainey PB (2011) Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria. PLOS Genetics, 7, e1002132.

Douglas GM, Shapiro BJ (2021) Genic Selection Within Prokaryotic Pangenomes. Genome Biology and Evolution, 13, evab234.

Fortmann-Grote C, Irmer J von, Bertels F (2023) RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. bioRxiv, 2022.05.22.493013, ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

Gilson E, Clément J m., Brutlag D, Hofnung M (1984) A family of dispersed repetitive extragenic palindromic DNA sequences in E. coli. The EMBO Journal, 3, 1417–1421.

Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends in Genetics, 17, 589–596.

Nunvar J, Huckova T, Licha I (2010) Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC Genomics, 11, 44.

Stern MJ, Ames GF-L, Smith NH, Clare Robinson E, Higgins CF (1984) Repetitive extragenic palindromic sequences: A major component of the bacterial genome. Cell, 37, 1015–1026.

RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomesFrederic Bertels, Julia von Irmer, Carsten Fortmann-Grote<p style="text-align: justify;">Compared to eukaryotes, repetitive sequences are rare in bacterial genomes and usually do not persist for long. Yet, there is at least one class of persistent prokaryotic mobile genetic elements: REPINs. REPINs are ...Bacteria and archaea, Bioinformatics, Evolutionary genomics, Viruses and transposable elementsGavin Douglas2022-06-07 08:21:34 View
02 Apr 2021
article picture

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection

Toward a critical assessment of virus detection in plants

Recommended by based on reviews by Alexander Suh and 1 anonymous reviewer

The advent of High Throughput Sequencing (HTS) since the last decade has revealed previously unsuspected diversity of viruses as well as their (sometimes) unexpected presence in some healthy individuals. These results demonstrate that genomics offers a powerful tool for studying viruses at the individual level, allowing an in-depth inventory of those that are infecting an organism. Such approaches make it possible to study viromes with an unprecedented level of detail, both qualitative and quantitative, which opens new venues for analyses of viruses of humans, animals and plants. Consequently, the diagnostic field is using more and more HTS, fueling the need for efficient and reliable bioinformatics tools. 

Many such tools have already been developed, but in plant disease diagnostics, validation of the bioinformatics pipelines used for the detection of viruses in HTS datasets is still in its infancy. There is an urgent need for benchmarking the different tools and algorithms using well-designed reference datasets generated for this purpose. This is a crucial step to move forward and to improve existing solutions toward well-standardized bioinformatics protocols. This context has led to the creation of the Plant Health Bioinformatics Network (PHBN), a Euphresco network project aiming to build a bioinformatics community working on plant health. One of their objectives is to provide researchers with open-access reference datasets allowing to compare and validate virus detection pipelines. 

In this framework, Tamisier et al. [1] present real, semi-artificial, and completely artificial datasets, each aimed at addressing challenges that could affect virus detection. These datasets comprise real RNA-seq reads from virus-infected plants as well as simulated virus reads. Such a work, providing open-access datasets for benchmarking bioinformatics tools, should be encouraged as they are key to software improvement as demonstrated by the well-known success story of the protein structure prediction community: their pioneer community-wide effort, called Critical Assessment of protein Structure Prediction (CASP)[2], has been providing research groups since 1994 with an invaluable way to objectively test their structure prediction methods, thereby delivering an independent assessment of state-of-art protein-structure modelling tools. Following this success, many other bioinformatic community developed similar “competitions”, such as RNA-puzzles [3] to predict RNA structures, Critical Assessment of Function Annotation [4] to predict gene functions, Critical Assessment of Prediction of Interactions [5] to predict protein-protein interactions, Assemblathon [6] for genome assembly, etc. These are just a few examples from a long list of successful initiatives. Such efforts enable rigorous assessments of tools, stimulate the developers’ creativity, but also provide user communities with a state-of-art evaluation of available tools.

Inspired by these success stories, the authors propose a “VIROMOCK challenge” [7], asking researchers in the field to test their tools and to provide feedback on each dataset through a repository. This initiative, if well followed, will undoubtedly improve the field of virus detection in plants, but also probably in many other organisms. This will be a major contribution to the field of viruses, leading to better diagnostics and, consequently, a better understanding of viral diseases, thus participating in promoting human, animal and plant health.   


[1] Tamisier, L., Haegeman, A., Foucart, Y., Fouillien, N., Al Rwahnih, M., Buzkan, N., Candresse, T., Chiumenti, M., De Jonghe, K., Lefebvre, M., Margaria, P., Reynard, J.-S., Stevens, K., Kutnjak, D. and Massart, S. (2021) Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection. Zenodo, 4273791, version 4 peer-reviewed and recommended by Peer community in Genomics. doi:

[2] Critical Assessment of protein Structure Prediction” (CASP) -

[3] RNA-puzzles -

[4] Critical Assessment of Function Annotation (CAFA) -

[5] Critical Assessment of Prediction of Interactions (CAPI) -

[6] Assemblathon -

[7] VIROMOCK challenge -

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detectionLucie Tamisier, Annelies Haegeman, Yoika Foucart, Nicolas Fouillien, Maher Al Rwahnih, Nihal Buzkan, Thierry Candresse, Michela Chiumenti, Kris De Jonghe, Marie Lefebvre, Paolo Margaria, Jean Sébastien Reynard, Kristian Stevens, Denis Kutnjak, Séb...<p>The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics...Bioinformatics, Plants, Viruses and transposable elementsHadi Quesneville2020-11-27 14:31:47 View
10 Jul 2023
article picture

SNP discovery by exome capture and resequencing in a pea genetic resource collection

The value of a large Pisum SNP dataset

Recommended by based on reviews by Rui Borges and 1 anonymous reviewer

One important goal of modern genetics is to establish functional associations between genotype and phenotype. Single nucleotide polymorphisms (SNPs) are numerous and widely distributed in the genome and can be obtained from nucleic acid sequencing (1). SNPs allow for the investigation of genetic diversity, which is critical for increasing crop resilience to the challenges posed by global climate change. The associations between SNPs and phenotypes can be captured in genome-wide association studies. SNPs can also be used in combination with machine learning, which is becoming more popular for predicting complex phenotypic traits like yield and biotic and abiotic stress tolerance from genotypic data (2). The availability of many SNP datasets is important in machine learning predictions because this approach requires big data to build a comprehensive model of the association between genotype and phenotype.

Aubert and colleagues have studied, as part of the PeaMUST project, the genetic diversity of 240 Pisum accessions (3). They sequenced exome-enriched genomic libraries, a technique that enables the identification of high-density, high-quality SNPs at a low cost (4). This technique involves capturing and sequencing only the exonic regions of the genome, which are the protein-coding regions. A total of 2,285,342 SNPs were obtained in this study. The analysis of these SNPs with the annotations of the genome sequence of one of the studied pea accessions (5) identified a number of SNPs that could have an impact on gene activity. Additional analyses revealed 647,220 SNPs that were unique to individual pea accessions, which might contribute to the fitness and diversity of accessions in different habitats. Phylogenetic and clustering analyses demonstrated that the SNPs could distinguish Pisum germplasms based on their agronomic and evolutionary histories. These results point out the power of selected SNPs as markers for identifying Pisum individuals.

Overall, this study found high-quality SNPs that are meaningful in a biological context. This dataset was derived from a large set of germplasm and is thus particularly useful for studying genotype-phenotype associations, as well as the diversity within Pisum species. These SNPs could also be used in breeding programs to develop new pea varieties that are resilient to abiotic and biotic stressors.  


1.         Fallah M, Jean M, Boucher St-Amour VT, O’Donoughue L, Belzile F. The construction of a high-density consensus genetic map for soybean based on SNP markers derived from genotyping-by-sequencing. Genome. 2022 Aug;65(8):413–25.

2.         Gill M, Anderson R, Hu H, Bennamoun M, Petereit J, Valliyodan B, et al. Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction. BMC Plant Biology. 2022 Apr 8;22(1):180.

3.         Aubert G, Kreplak J, Leveugle M, Duborjal H, Klein A, Boucherot K, et al. SNP discovery by exome capture and resequencing in a pea genetic resource collection., biorxiv, ver. 4, peer-reviewed and recommended by Peer Community in Genomics. 

4.         Warr A, Robert C, Hume D, Archibald A, Deeb N, Watson M. Exome sequencing: current and future perspectives. G3 Genes|Genomes|Genetics. 2015 Aug 1;5(8):1543–50.

5.         Kreplak J, Madoui MA, Cápal P, Novák P, Labadie K, Aubert G, et al. A reference genome for pea provides insight into legume genome evolution. Nat Genet. 2019 Sep;51(9):1411–22.

SNP discovery by exome capture and resequencing in a pea genetic resource collectionG. Aubert, J. Kreplak, M. Leveugle, H. Duborjal, A. Klein, K. Boucherot, E. Vieille, M. Chabert-Martinello, C. Cruaud, V. Bourion, I. Lejeune-Hénaut, M.L. Pilet-Nayel, Y. Bouchenak-Khelladi, N. Francillonne, N. Tayeh, J.P. Pichon, N. Rivière, J. B...<p style="text-align: justify;"><strong>Background &amp; Summary</strong></p> <p style="text-align: justify;">In addition to being the model plant used by Mendel to establish genetic laws, pea (<em>Pisum sativum</em> L., 2n=14) is a major pulse c...Plants, Population genomicsWanapinun Nawae2022-11-29 09:29:06 View
08 Nov 2022
article picture

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks

How to best call the somatic mosaic tree?

Recommended by based on reviews by 2 anonymous reviewers

Any multicellular organism is a molecular mosaic with some somatic mutations accumulated between cell lineages. Big long-lived trees have nourished this imaginary of a somatic mosaic tree, from the observation of spectacular phenotypic mosaics and also because somatic mutations are expected to potentially be passed on to gametes in plants (review in Schoen and Schultz 2019). The lower cost of genome sequencing now offers the opportunity to tackle the issue and identify somatic mutations in trees.

However, when it comes to characterizing this somatic mosaic from genome sequences, things become much more difficult than one would think in the first place. What separates cell lineages ontogenetically, in cell division number, or in time? How to sample clonal cell populations? How do somatic mutations distribute in a population of cells in an organ or an organ sample? Should they be fixed heterozygotes in the sample of cells sequenced or be polymorphic? Do we indeed expect somatic mutations to be fixed? How should we identify and count somatic mutations?

To date, the detection of somatic mutations has mostly been done with a single variant caller in a given study, and we have little perspective on how different callers provide similar or different results. Some studies have used standard SNP callers that assumed a somatic mutation is fixed at the heterozygous state in the sample of cells, with an expected allele coverage ratio of 0.5, and less have used cancer callers, designed to detect mutations in a fraction of the cells in the sample. However, standard SNP callers detect mutations that deviate from a balanced allelic coverage, and different cancer callers can have different characteristics that should affect their outcomes.

In order to tackle these issues, Schmitt et al. (2022) conducted an extensive simulation analysis to compare different variant callers. Then, they reanalyzed two large published datasets on pedunculate oak, Quercus robur.  The analysis of in silico somatic mutations allowed the authors to evaluate the performance of different variant callers as a function of the allelic fraction of somatic mutations and the sequencing depth. They found one of the seven callers to provide better and more robust calls for a broad set of allelic fractions and sequencing depths. The reanalysis of published datasets in oaks with the most effective cancer caller of the in silico analysis allowed them to identify numerous low-frequency mutations that were missed in the original studies.

I recommend the study of Schmitt et al. (2022) first because it shows the benefit of using cancer callers in the study of somatic mutations, whatever the allelic fraction you are interested in at the end. You can select fixed heterozygotes if this is your ultimate target, but cancer callers allow you to have in addition a valuable overview of the allelic fractions of somatic mutations in your sample, and most do as well as SNP callers for fixed heterozygous mutations. In addition, Schmitt et al. (2022) provide the pipelines that allow investigating in silico data that should correspond to a given study design, encouraging to compare different variant callers rather than arbitrarily going with only one. We can anticipate that the study of somatic mutations in non-model species will increasingly attract attention now that multiple tissues of the same individual can be sequenced at low cost, and the study of Schmitt et al. (2022) paves the way for questioning and choosing the best variant caller for the question one wants to address.


Schoen DJ, Schultz ST (2019) Somatic Mutation and Evolution in Plants. Annual Review of Ecology, Evolution, and Systematics, 50, 49–73.

Schmitt S, Leroy T, Heuertz M, Tysklind N (2022) Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks. bioRxiv, 2021.10.11.462798. ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaksSylvain Schmitt, Thibault Leroy, Myriam Heuertz, Niklas Tysklind<p style="text-align: justify;">1. Mutation, the source of genetic diversity, is the raw material of evolution; however, the mutation process remains understudied, especially in plants. Using both a simulation and reanalysis framework, we set out ...Bioinformatics, PlantsNicolas BierneAnonymous, Anonymous2022-04-28 13:24:19 View
03 Jul 2024
article picture

T7 DNA polymerase treatment improves quantitative sequencing of both double-stranded and single-stranded DNA viruses

Improving the sequencing of single-stranded DNA viruses: Another brick for building Earth's complete virome encyclopedia

Recommended by based on reviews by Philippe Roumagnac and 3 anonymous reviewers

The wide adoption of high-throughput sequencing technologies has uncovered an astonishing diversity of viruses in most biosphere habitats. Among them, single-stranded DNA viruses are prevalent, infecting diverse hosts from all three domains of life (Malathi et al. 2014) with some species being highly pathogenic to animals or plants.

Sequencing of single-stranded DNA viruses requires a specific approach that usually leads to their over-representation compared to double-stranded DNA. The article from Billaud et al. (2024) addresses this challenge. It presents a novel and efficient method for converting single-stranded DNA to double-stranded DNA using T7 DNA polymerase before high-throughput virome sequencing. It compares this new method with the Phi29 polymerase method, demonstrating its advantages in the representation and accuracy of viral DNA content in well-defined synthetic phage mixtures and complex human virome samples from the stool. This T7 DNA polymerase treatment significantly improved the richness and abundance of the Microviridae fraction in their samples, suggesting a more comprehensive representation of viral diversity.

The article presents a compelling case for testing and adopting the T7 DNA polymerase methodology in preparing virome samples for shotgun sequencing. This novel approach, supported by comparative analysis with existing methodologies, represents a valuable contribution to metagenomics for characterizing virome diversity.



Billaud M, Theodorou I, Lamy-Besnier Q, Shah SA, Lecointe F, Sordi LD, Paepe MD, Petit M-A (2024) T7 DNA polymerase treatment improves quantitative sequencing of both double-stranded and single-stranded DNA viruses. bioRxiv, ver. 4 peer-reviewed and recommended by Peer Community in Genomics.

Malathi VG, Renuka Devi P. (2019) ssDNA viruses: key players in global virome. Virus disease. 30: 3–12. 


T7 DNA polymerase treatment improves quantitative sequencing of both double-stranded and single-stranded DNA virusesMaud Billaud, Ilias Theodorou, Quentin Lamy-Besnier, Shiraz Shah, François Lecointe, Luisa De Sordi, Marianne De Paepe, Marie-Agnès Petit<p>Background: Bulk microbiome, as well as virome-enriched shotgun sequencing only reveals the double-stranded DNA (dsDNA) content of a given sample, unless specific treatments are applied. However, genomes of viruses often consist of a circular s...Viruses and transposable elementsSebastien Massart2023-12-20 16:50:00 View
07 Sep 2023
article picture

The demographic history of the wild crop relative Brachypodium distachyon is shaped by distinct past and present ecological niches

Natural variation and adaptation in Brachypodium distachyon

Recommended by based on reviews by Thibault Leroy and 1 anonymous reviewer

Identifying the genetic factors that allow plant adaptation is a major scientific question that is particularly relevant in the face of the climate change that we are already experiencing. To address this, it is essential to have genetic information on a high number of accessions (i.e., plants registered with unique accession numbers) growing under contrasting environmental conditions. There is already an important number of studies addressing these issues in the plant Arabidopsis thaliana, but there is a need to expand these analyses to species that play key roles in wild ecosystems and are close to very relevant crops, as is the case of grasses.

The work of Minadakis, Roulin and co-workers (1) presents a Brachypodium distachyon panel of 332 fully sequences accessions that covers the whole species distribution across a wide range of bioclimatic conditions, which will be an invaluable tool to fill this gap. In addition, the authors use this data to start analyzing the population structure and demographic history of this plant, suggesting that the species experienced a shift of its distribution following the Last Glacial Maximum, which may have forced the species into new habitats. The authors also present a modeling of the niches occupied by B. distachyon together with an analysis of the genetic clades found in each of them, and start analyzing the different adaptive loci that may have allowed the species’ expansion into different bioclimatic areas.

In addition to the importance of the resources made available by the authors for the scientific community, the analyses presented are well done and carefully discussed, and they highlight the potential of these new resources to investigate the genetic bases of plant adaptation. 


1. Nikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin. The demographic history of the wild crop relative Brachypodium distachyon is shaped by distinct past and present ecological niches. bioRxiv, 2023.06.01.543285, ver. 5 peer-reviewed and recommended by Peer Community in Genomics.

The demographic history of the wild crop relative *Brachypodium distachyon* is shaped by distinct past and present ecological nichesNikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin<p style="text-align: justify;">Closely related to economically important crops, the grass <em>Brachypodium distachyon</em> has been originally established as a pivotal species for grass genomics but more recently flourished as a model for develop...Evolutionary genomics, Functional genomics, Plants, Population genomicsJosep Casacuberta2023-06-14 15:28:30 View