Latest recommendations
Id | Title * | Authors * | Abstract * ▲ | Picture * | Thematic fields * | Recommender | Reviewers | Submission date | |
---|---|---|---|---|---|---|---|---|---|
13 Mar 2025
![]() Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping biasTorsten Günther, Amy Goldberg, Joshua G. Schraiber https://doi.org/10.1101/2024.07.01.601500A novel genotype likelihood-based method to reduce mapping bias in low-coverage and ancient DNA studiesRecommended by Sebastian Ernesto Ramos-OnsinsThe study of genomic variability within and between populations, as well as among species, relies on comparative analyses of homologous positions—sites that share a common evolutionary origin. Homology is inferred through sequence similarity (Reeck et al. 1987). However, the ability to detect homologous regions can be compromised when sequence mismatches accumulate due to mutations, especially when analyzing short DNA fragments, as in short-read sequencing (Li et al. 2008). In the genomic era, accurately mapping homologous DNA fragments to a reference genome is essential for obtaining precise estimates of genetic variability and evolutionary inferences (e.g., Li et al. 2008; Ellegren 2014). However, short-read, high-throughput sequencing often introduces mapping bias, disproportionately favoring the reference allele. This bias distorts allele frequency estimates, ancestry proportions, and genotype likelihoods, impacting downstream analyses (e.g., Günther & Nettelblad 2019; Martiniano et al. 2020). Mapping bias is particularly problematic in ancient DNA studies, where post-mortem damage exacerbates sequencing errors. DNA fragmentation limits read length, while deamination, causing G to A and C to U transitions, increases mismatches and further complicates homology identification (Dabney & Pääbo 2013). These degradation processes contribute to the misidentification of true variants, confounding evolutionary inferences. Various strategies have been developed to mitigate mapping bias, including the commonly used approach, called pseudo-haploid data, that randomly picks a single read at each analyzed position for each individual, thereby retaining a single allele at each polymorphic site (Günther & Nettelblad 2019; Barlow et al. 2020). Günther et al. (2025) introduce a novel method to correct mapping bias using a genotype likelihood-based approach, incorporating a mapping bias ratio to adjust for reference allele overrepresentation. The method specifically targets known single nucleotide polymorphisms (SNPs) because in population genomic analysis of ancient DNA data, low coverage and post-mortem damage often hinder the ability to identify novel SNPs in most individuals. The analysis focuses on DNA fragmentation, assuming that deamination effects are minimal when considering ascertained SNPs. The proposed method was compared against existing approaches, including pseudo-haploid data and standard genotype likelihood-based probabilistic methods. The evaluation was performed using both empirical and simulated data. For empirical data, low-coverage sequencing data from the 1000 Genomes Project (Finnish in Finland, Japanese in Tokyo, Yoruba in Ibadan, Nigeria populations) was analyzed, while for simulated data, ancient DNA-like datasets were generated using ms-prime (Kelleher et al. 2016), modeling different sequencing depths, divergence times, and reference genome choices. The study assesses the impact of mapping bias on the ratio of reference versus non-reference allele mapping, the accuracy of SNP allele frequency estimates relative to true frequencies, the deviation and variance between estimated and true allele frequencies, population differentiation and the estimation of admixture proportions using supervised and unsupervised methods, considering both genotype likelihoods and genotype calls. Günther et al. (2025) bring to light that all methods analyzed exhibit minor but systematic reference allele bias. The new corrected genotype likelihood method outperforms the standard genotype likelihood approach in correlating with true allele frequencies, although the pseudo-haploid method still provides the most accurate estimates. Mapping bias also affects ancestry estimation, leading to admixture proportion errors of up to 4%, though this effect is smaller than the 10% discrepancy observed across different inference methods. The work performed by Günther et al. (2025) provides a rigorous and innovative evaluation of mapping bias in the context of ascertained SNPs, introducing a probabilistic approach that improves bias correction. Unlike non-probabilistic methods such as pseudo-haploid data, the genotype likelihood framework leverages all sequencing reads for each analyzed SNP, and can incorporate additional bias corrections, enhancing its applicability across different sequencing conditions. While probabilistic approaches offer clear advantages in bias correction, they can be less intuitive to interpret compared to traditional genotype calling methods. This study highlights that mapping bias is pervasive across all methods, influencing evolutionary inferences such as selection signals and population differentiation. Although the improvements in allele frequency recovery may seem modest, the genome-wide impact of mapping bias is significant, especially in ancient DNA studies, making bias correction essential for robust evolutionary analyses.
References Ellegren H. (2014) Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol. 29(1):51-63. https://doi.org/10.1016/j.tree.2013.09.008 Günther T, Nettelblad C. (2019) The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet.15(7):e1008302. https://doi.org/10.1371/journal.pgen.1008302 Günther T., Goldberg A., Schraiber J. G. (2025) Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping bias. bioRxiv, ver. 5 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2024.07.01.601500 Kelleher J., Etheridge A. M., McVean G. (2016) Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS computational biology, 12(5):e1004842. https://doi.org/10.1371/journal.pcbi.1004842 Li H, Ruan J, Durbin R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11):1851-8. https://doi.org/10.1101/gr.078212.108 Reeck GR, de Haën C, Teller DC, Doolittle RF, Fitch WM, Dickerson RE, et al. (1987) "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it. Cell. 50 (5): 667. https://doi.org/10.1016/0092-8674(87)90322-9 | Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping bias | Torsten Günther, Amy Goldberg, Joshua G. Schraiber | <p>Population genomic analyses rely on an accurate and unbiased characterization of the genetic composition of the studied population. For short-read, high-throughput sequencing data, mapping sequencing reads to a linear reference genome can bias ... | ![]() | Bioinformatics, Evolutionary genomics, Population genomics | Sebastian Ernesto Ramos-Onsins | 2024-07-02 10:46:19 | View | |
21 Aug 2024
![]() MATEdb2, a collection of high-quality metazoan proteomes across the Animal Tree of Life to speed up phylogenomic studiesGemma I. Martínez-Redondo, Carlos Vargas-Chávez, Klara Eleftheriadi, Lisandra Benítez-Álvarez, Marçal Vázquez-Valls, Rosa Fernández https://doi.org/10.1101/2024.02.21.581367MATEdb2 is a valuable phylogenomics resource across MetazoaRecommended by Philipp SchifferMartínez-Redondo and colleagues (2024) present MATEdb2, which provides the scientific community with Metazoa proteomes that have been predicted and annotated in a standardised way. The authors improved the taxon representation from the earlier MATEdb and their current database has a strong focus on Arthropoda, Annelida, and Mollusca. In particular, for the latter two groups not many high-quality reference genomes are available. Standardisation of the prediction and annotation process in a reproducible pipeline, as integrated in MATEdb2, is of great value, in particular to infer phylogenies as correctly as possible. Thus, I am sure that MATEdb2 will be an excellent go-to resource for phylogenomic studies, as well as for probing the biology of new, obscure species, especially marine ones.
| MATEdb2, a collection of high-quality metazoan proteomes across the Animal Tree of Life to speed up phylogenomic studies | Gemma I. Martínez-Redondo, Carlos Vargas-Chávez, Klara Eleftheriadi, Lisandra Benítez-Álvarez, Marçal Vázquez-Valls, Rosa Fernández | <p>Recent advances in high throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, ge... | ![]() | Arthropods, Bioinformatics, Evolutionary genomics, Marine invertebrates, Terrestrial invertebrates | Philipp Schiffer | 2024-03-04 11:37:21 | View | |
09 Aug 2023
![]() Efficient k-mer based curation of raw sequence data: application in Drosophila suzukiiGautier Mathieu https://doi.org/10.1101/2023.04.18.537389Decontaminating reads, not contigsRecommended by Nicolas Galtier based on reviews by Marie Cariou and Denis BaurainContamination, the presence of foreign DNA sequences in a sample of interest, is currently a major problem in genomics. Because contamination is often unavoidable at the experimental stage, it is increasingly recognized that the processing of high-throughput sequencing data must include a decontamination step. This is usually performed after the many sequence reads have been assembled into a relatively small number of contigs. Dubious contigs are then discarded based on their composition (e.g. GC-content) or because they are highly similar to a known piece of DNA from a foreign species. Here [1], Mathieu Gautier explores a novel strategy consisting in decontaminating reads, not contigs. Why is this promising? Assembly programs and algorithms are complex, and it is not easy to predict, or monitor, how they handle contaminant reads. Ideally, contaminant reads will be assembled into obvious contaminant contigs. However, there might be more complex situations, such as chimeric contigs with alternating genuine and contaminant segments. Decontaminating at the read level, if possible, should eliminate such unfavorable situations where sequence information from contaminant and target samples are intimately intertwined by an assembler. To achieve this aim, Gautier proposes to use methods initially designed for the analysis of metagenomic data. This is pertinent since the decontamination process involves considering a sample as a mixture of different sources of DNA. The programs used here, CLARK and CLARK-L, are based on so-called k-mer analysis, meaning that the similarity between a read to annotate and a reference sequence is measured by how many sub-sequences (of length 31 base pairs for CLARK and 27 base pairs for CLARK-L) they share. This is notoriously more efficient than traditional sequence alignment algorithms when it comes to comparing a very large number of (most often unrelated) sequences. This is, therefore, a reference-based approach, in which the reads from a sample are assigned to previously sequenced genomes based on k-mer content. This original approach is here specifically applied to the case of Drosophila suzukii, an invasive pest damaging fruit production in Europe and America. Fortunately, Drosophila is a genus of insects with abundant genomic resources, including high-quality reference genomes in dozens of species. Having calibrated and validated his pipeline using data sets of known origins, Gautier quantifies in each of 258 presumed D. suzukii samples the proportion of reads that likely belong to other species of fruit flies, or to fruit fly-associated microbes. This proportion is close to one in 16 samples, which clearly correspond to mis-labelled individuals. It is non-negligible in another ~10 samples, which really correspond to D. suzukii individuals. Most of these reads of unexpected origin are contaminants and should be filtered out. Interestingly, one D. suzukii sample contains a substantial proportion of reads from the closely related D. subpulchera, which might instead reflect a recent episode of gene flow between these two species. The approach, therefore, not only serves as a crucial technical step, but also has the potential to reveal biological processes. Gautier's thorough, well-documented work will clearly benefit the ongoing and future research on D. suzuki, and Drosophila genomics in general. The author and reviewers rightfully note that, like any reference-based approach, this method is heavily dependent on the availability and quality of reference genomes - Drosophila being a favorable case. Building the reference database is a key step, and the interpretation of the output can only be made in the light of its content and gaps, as illustrated by Gautier's careful and detailed discussion of his numerous results. This pioneering study is a striking demonstration of the potential of metagenomic methods for the decontamination of high-throughput sequence data at the read level. The pipeline requires remarkably few computing resources, ensuring low carbon emission. I am looking forward to seeing it applied to a wide range of taxa and samples.
Reference [1] Gautier Mathieu. Efficient k-mer based curation of raw sequence data: application in Drosophila suzukii. bioRxiv, 2023.04.18.537389, ver. 2, peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2023.04.18.537389 | Efficient k-mer based curation of raw sequence data: application in *Drosophila suzukii* | Gautier Mathieu | <p>Several studies have highlighted the presence of contaminated entries in public sequence repositories, calling for special attention to the associated metadata. Here, we propose and evaluate a fast and efficient kmer-based approach to assess th... | ![]() | Bioinformatics, Population genomics | Nicolas Galtier | 2023-04-20 22:05:13 | View | |
13 Nov 2024
![]() Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virusPierre Brézellec https://doi.org/10.1101/2023.06.06.543855Leveraging HHpred with rigorous validation for improved detection of host-virus homologiesRecommended by Jitendra NarayanThe assessment by Brézellec (2024) of the quality of HHpred-based SARS-CoV-2 protein annotations against the traditional Pfam annotations is highly justified and valuable. HHpred’s ability to detect remote homologies offers an expanded view of viral protein similarities, potentially uncovering subtle functional mimicries that Pfam may miss due to its sensitivity limitations when dealing with divergent sequences. However, the accuracy and specificity of HHpred results can be compromised by false positives, especially when dealing with complex viral proteins that feature transmembrane or low-complexity regions prone to spurious matches. To address this, the author made a thoughtful decision to implement a multi-step validation protocol. This approach included establishing progressively lower probability thresholds to capture weaker but biologically plausible hits, and organizing hits into “families” of similarly located alignments to validate the robustness of matches. They also cross-verified results by running SARS-CoV-2 protein queries against non-human proteomes (plants, fruit flies, bacteria, and archaea), allowing them to discern between biologically meaningful matches and potentially random alignments. By adding manual verification with InterPro domain annotations, the authors took additional steps to ensure that identified similarities were not only statistically significant but also biologically relevant. This rigorous validation strategy adds a layer of reliability to HHpred results, demonstrating an effective maximization of sensitivity while maintaining specificity. This approach yielded biologically intriguing and previously undocumented similarities, such as between the Spike-prominin and ORF3a-GPCR, underscoring the quality and depth of the annotation process. These findings highlight a pathway for further experimental validation and illustrate the potential of HHpred to contribute high-quality insights when applied with careful quality control measures. In summary, the decision to adopt HHpred (Gabler et al. 2020) and enhance its outputs with a robust quality validation process not only improved the depth of SARS-CoV-2 protein annotations but also established a high standard for future viral annotation projects, striking an effective balance between discovery potential and annotation quality. The authors have conducted a study that is methodologically rigorous, well-detailed, and highly pertinent to the field. This work stands as a significant contribution to the scientific community, providing resources and insights that are likely to guide future research in this area. Brézellec, P (2024) Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virus. bioRxiv, ver. 3 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2023.06.06.543855 Gabler F, Nam S-Z, Till S, Mirdita M, Steinegger M, Söding J, Lupas AN, Alva V (2020) Protein Sequence Analysis Using the MPI Bioinformatics Toolkit. Current Protocols in Bioinformatics, 72, e108. https://doi.org/10.1002/cpbi.108
| Re-annotation of SARS-CoV-2 proteins using an HHpred-based approach opens new opportunities for a better understanding of this virus | Pierre Brézellec | <p>Since the publication of the genome of SARS-CoV-2 – the causative agent of COVID-19 – in January 2020, many bioinformatic tools have been applied to annotate its proteins. Although efficient methods have been used, such as the identification of... | ![]() | Bioinformatics, Evolutionary genomics, Viruses and transposable elements | Jitendra Narayan | 2023-06-08 10:17:04 | View | |
11 Mar 2021
![]() Gut microbial ecology of Xenopus tadpoles across life stagesThibault Scalvenzi, Isabelle Clavereau, Mickael Bourge, Nicolas Pollet https://doi.org/10.1101/2020.05.25.110734A comprehensive look at Xenopus gut microbiota: effects of feed, developmental stages and parental transmissionRecommended by Wirulda Pootakham based on reviews by Vanessa Marcelino and 1 anonymous reviewerIt is well established that the gut microbiota play an important role in the overall health of their hosts (Jandhyala et al. 2015). To date, there are still a limited number of studies on the complex microbial communites inhabiting vertebrate digestive systems, especially the ones that also explored the functional diversity of the microbial community (Bletz et al. 2016). This preprint by Scalvenzi et al. (2021) reports a comprehensive study on the phylogenetic and metabolic profiles of the Xenopus gut microbiota. The author describes significant changes in the gut microbiome communities at different developmental stages and demonstrates different microbial community composition across organs. In addition, the study also investigates the impact of diet on the Xenopus tadpole gut microbiome communities as well as how the bacterial communities are transmitted from parents to the next generation. This is one of the first studies that addresses the interactions between gut bacteria and tadpoles during the development. The authors observe the dynamics of gut microbiome communities during tadpole growth and metamorphosis. They also explore host-gut microbial community metabolic interactions and demostrate the capacity of the microbiome to complement the metabolic pathways of the Xenopus genome. Although this study is limited by the use of Xenopus tadpoles in a laboratory, which are probably different from those in nature, I believe it still provides important and valuable information for the research community working on vertebrate’s microbiota and their interaction with the host. References Bletz et al. (2016). Amphibian gut microbiota shifts differentially in community structure but converges on habitat-specific predicted functions. Nature Communications, 7(1), 1-12. doi: https://doi.org/10.1038/ncomms13699 Jandhyala, S. M., Talukdar, R., Subramanyam, C., Vuyyuru, H., Sasikala, M., & Reddy, D. N. (2015). Role of the normal gut microbiota. World journal of gastroenterology: WJG, 21(29), 8787. doi: https://dx.doi.org/10.3748%2Fwjg.v21.i29.8787 Scalvenzi, T., Clavereau, I., Bourge, M. & Pollet, N. (2021) Gut microbial ecology of Xenopus tadpoles across life stages. bioRxiv, 2020.05.25.110734, ver. 4 peer-reviewed and recommended by Peer community in Geonmics. https://doi.org/10.1101/2020.05.25.110734 | Gut microbial ecology of Xenopus tadpoles across life stages | Thibault Scalvenzi, Isabelle Clavereau, Mickael Bourge, Nicolas Pollet | <p><strong>Background</strong> The microorganism world living in amphibians is still largely under-represented and under-studied in the literature. Among anuran amphibians, African clawed frogs of the Xenopus genus stand as well-characterized mode... | ![]() | Evolutionary genomics, Metagenomics, Vertebrates | Wirulda Pootakham | 2020-05-25 14:01:19 | View | |
08 Nov 2022
![]() Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaksSylvain Schmitt, Thibault Leroy, Myriam Heuertz, Niklas Tysklind https://doi.org/10.1101/2021.10.11.462798How to best call the somatic mosaic tree?Recommended by Nicolas Bierne based on reviews by 2 anonymous reviewersAny multicellular organism is a molecular mosaic with some somatic mutations accumulated between cell lineages. Big long-lived trees have nourished this imaginary of a somatic mosaic tree, from the observation of spectacular phenotypic mosaics and also because somatic mutations are expected to potentially be passed on to gametes in plants (review in Schoen and Schultz 2019). The lower cost of genome sequencing now offers the opportunity to tackle the issue and identify somatic mutations in trees. However, when it comes to characterizing this somatic mosaic from genome sequences, things become much more difficult than one would think in the first place. What separates cell lineages ontogenetically, in cell division number, or in time? How to sample clonal cell populations? How do somatic mutations distribute in a population of cells in an organ or an organ sample? Should they be fixed heterozygotes in the sample of cells sequenced or be polymorphic? Do we indeed expect somatic mutations to be fixed? How should we identify and count somatic mutations? To date, the detection of somatic mutations has mostly been done with a single variant caller in a given study, and we have little perspective on how different callers provide similar or different results. Some studies have used standard SNP callers that assumed a somatic mutation is fixed at the heterozygous state in the sample of cells, with an expected allele coverage ratio of 0.5, and less have used cancer callers, designed to detect mutations in a fraction of the cells in the sample. However, standard SNP callers detect mutations that deviate from a balanced allelic coverage, and different cancer callers can have different characteristics that should affect their outcomes. In order to tackle these issues, Schmitt et al. (2022) conducted an extensive simulation analysis to compare different variant callers. Then, they reanalyzed two large published datasets on pedunculate oak, Quercus robur. The analysis of in silico somatic mutations allowed the authors to evaluate the performance of different variant callers as a function of the allelic fraction of somatic mutations and the sequencing depth. They found one of the seven callers to provide better and more robust calls for a broad set of allelic fractions and sequencing depths. The reanalysis of published datasets in oaks with the most effective cancer caller of the in silico analysis allowed them to identify numerous low-frequency mutations that were missed in the original studies. I recommend the study of Schmitt et al. (2022) first because it shows the benefit of using cancer callers in the study of somatic mutations, whatever the allelic fraction you are interested in at the end. You can select fixed heterozygotes if this is your ultimate target, but cancer callers allow you to have in addition a valuable overview of the allelic fractions of somatic mutations in your sample, and most do as well as SNP callers for fixed heterozygous mutations. In addition, Schmitt et al. (2022) provide the pipelines that allow investigating in silico data that should correspond to a given study design, encouraging to compare different variant callers rather than arbitrarily going with only one. We can anticipate that the study of somatic mutations in non-model species will increasingly attract attention now that multiple tissues of the same individual can be sequenced at low cost, and the study of Schmitt et al. (2022) paves the way for questioning and choosing the best variant caller for the question one wants to address. References Schoen DJ, Schultz ST (2019) Somatic Mutation and Evolution in Plants. Annual Review of Ecology, Evolution, and Systematics, 50, 49–73. https://doi.org/10.1146/annurev-ecolsys-110218-024955 Schmitt S, Leroy T, Heuertz M, Tysklind N (2022) Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks. bioRxiv, 2021.10.11.462798. ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2021.10.11.462798 | Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks | Sylvain Schmitt, Thibault Leroy, Myriam Heuertz, Niklas Tysklind | <p style="text-align: justify;">1. Mutation, the source of genetic diversity, is the raw material of evolution; however, the mutation process remains understudied, especially in plants. Using both a simulation and reanalysis framework, we set out ... | ![]() | Bioinformatics, Plants | Nicolas Bierne | Anonymous, Anonymous | 2022-04-28 13:24:19 | View |
27 Apr 2021
![]() Uncovering transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzziiCarlos Vargas-Chavez, Neil Michel Longo Pendy, Sandrine E. Nsango, Laura Aguilera, Diego Ayala, and Josefa González https://doi.org/10.1101/2020.11.22.393231Anopheles coluzzii, a new system to study how transposable elements may foster adaptation to urban environmentsRecommended by Anne Roulin based on reviews by Yann Bourgeois and 1 anonymous reviewerTransposable elements (TEs) are mobile DNA sequences that can increase their copy number and move from one location to another within the genome [1]. Because of their transposition dynamics, TEs constitute a significant fraction of eukaryotic genomes. TEs are also known to play an important functional role and a wealth of studies has now reported how TEs may influence single host traits [e.g. 2–4]. Given that TEs are more likely than classical point mutations to cause extreme changes in gene expression and phenotypes, they might therefore be especially prone to produce the raw diversity necessary for individuals to respond to challenging environments [5,6] such as the ones found in urban area.
| Uncovering transposable element variants and their potential adaptive impact in urban populations of the malaria vector Anopheles coluzzii | Carlos Vargas-Chavez, Neil Michel Longo Pendy, Sandrine E. Nsango, Laura Aguilera, Diego Ayala, and Josefa González | <p style="text-align: justify;">Background</p> <p style="text-align: justify;">Anopheles coluzzii is one of the primary vectors of human malaria in sub-Saharan Africa. Recently, it has colonized the main cities of Central Africa threatening vecto... | ![]() | Evolutionary genomics | Anne Roulin | 2020-12-02 14:58:47 | View | |
23 Mar 2022
![]() Chromosomal rearrangements with stable repertoires of genes and transposable elements in an invasive forest-pathogenic fungusArthur Demene, Benoit Laurent, Sandrine Cros-Arteil, Christophe Boury, Cyril Dutech https://doi.org/10.1101/2021.03.09.434572Comparative genomics in the chestnut blight fungus Cryphonectria parasitica reveals large chromosomal rearrangements and a stable genome organizationRecommended by Sebastien Duplessis based on reviews by Benjamin Schwessinger and 1 anonymous reviewerAbout twenty-five years after the sequencing of the first fungal genome and a dozen years after the first plant pathogenic fungi genomes were sequenced, unprecedented international efforts have led to an impressive collection of genomes available for the community of mycologists in international databases (Goffeau et al. 1996, Dean et al. 2005; Spatafora et al. 2017). For instance, to date, the Joint Genome Institute Mycocosm database has collected more than 2,100 fungal genomes over the fungal tree of life (https://mycocosm.jgi.doe.gov). Such resources are paving the way for comparative genomics, population genomics and phylogenomics to address a large panel of questions regarding the biology and the ecology of fungal species. Early on, population genomics applied to pathogenic fungi revealed a great diversity of genome content and organization and a wide variety of variants and rearrangements (Raffaele and Kamoun 2012, Hartmann 2022). Such plasticity raises questions about how to choose a representative genome to serve as an ideal reference to address pertinent biological questions. Cryphonectria parasitica is a fungal pathogen that is infamous for the devastation of chestnut forests in North America after its accidental introduction more than a century ago (Anagnostakis 1987). Since then, it has been a quarantine species under surveillance in various parts of the world. As for other fungi causing diseases on forest trees, the study of adaptation to its host in the forest ecosystem and of its reproduction and dissemination modes is more complex than for crop-targeting pathogens. A first reference genome was published in 2020 for the chestnut blight fungus C. parasitica strain EP155 in the frame of an international project with the DOE JGI (Crouch et al. 2020). Another genome was then sequenced from the French isolate YVO003, which showed a few differences in the assembly suggesting possible rearrangements (Demené et al. 2019). Here the sequencing of a third isolate ESM015 from the native area of C. parasitica in Japan allows to draw broader comparative analysis and particularly to compare between native and introduced isolates (Demené et al. 2022). Demené and collaborators report on a new genome sequence using up-to-date long-read sequencing technologies and they provide an improved genome assembly. Comparison with previously published C. parasitica genomes did not reveal dramatic changes in the overall chromosomal landscapes, but large rearrangements could be spotted. Despite these rearrangements, the genome content and organization – i.e. genes and repeats – remain stable, with a limited number of genes gains and losses. As in any fungal plant pathogen genome, the repertoire of candidate effectors predicted among secreted proteins was more particularly scrutinized. Such effector genes have previously been reported in other pathogens in repeat-enriched plastic genomic regions with accelerated evolutionary rates under the pressure of the host immune system (Raffaele and Kamoun 2012). Demené and collaborators established a list of priority candidate effectors in the C. parasitica gene catalog likely involved in the interaction with the host plant which will require more attention in future functional studies. Six major inter-chromosomal translocations were detected and are likely associated with double break strands repairs. The authors speculate on the possible effects that these translocations may have on gene organization and expression regulation leading to dramatic phenotypic changes in relation to introduction and invasion in new continents and the impact regarding sexual reproduction in this fungus (Demené et al. 2022). I recommend this article not only because it is providing an improved assembly of a reference genome for C. parasitica, but also because it adds diversity in terms of genome references availability, with a third high-quality assembly. Such an effort in the tree pathology community for a pathogen under surveillance is of particular importance for future progress in post-genomic analysis, e.g. in further genomic population studies (Hartmann 2022). References Anagnostakis SL (1987) Chestnut Blight: The Classical Problem of an Introduced Pathogen. Mycologia, 79, 23–37. https://doi.org/10.2307/3807741 Crouch JA, Dawe A, Aerts A, Barry K, Churchill ACL, Grimwood J, Hillman BI, Milgroom MG, Pangilinan J, Smith M, Salamov A, Schmutz J, Yadav JS, Grigoriev IV, Nuss DL (2020) Genome Sequence of the Chestnut Blight Fungus Cryphonectria parasitica EP155: A Fundamental Resource for an Archetypical Invasive Plant Pathogen. Phytopathology®, 110, 1180–1188. https://doi.org/10.1094/PHYTO-12-19-0478-A Dean RA, Talbot NJ, Ebbole DJ, Farman ML, Mitchell TK, Orbach MJ, Thon M, Kulkarni R, Xu J-R, Pan H, Read ND, Lee Y-H, Carbone I, Brown D, Oh YY, Donofrio N, Jeong JS, Soanes DM, Djonovic S, Kolomiets E, Rehmeyer C, Li W, Harding M, Kim S, Lebrun M-H, Bohnert H, Coughlan S, Butler J, Calvo S, Ma L-J, Nicol R, Purcell S, Nusbaum C, Galagan JE, Birren BW (2005) The genome sequence of the rice blast fungus Magnaporthe grisea. Nature, 434, 980–986. https://doi.org/10.1038/nature03449 Demené A., Laurent B., Cros-Arteil S., Boury C. and Dutech C. 2022. Chromosomal rearrangements with stable repertoires of genes and transposable elements in an invasive forest-pathogenic fungus. bioRxiv, 2021.03.09.434572, ver.6 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2021.03.09.434572 Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG (1996) Life with 6000 Genes. Science, 274, 546–567. https://doi.org/10.1126/science.274.5287.546 Hartmann FE (2022) Using structural variants to understand the ecological and evolutionary dynamics of fungal plant pathogens. New Phytologist, 234, 43–49. https://doi.org/10.1111/nph.17907 Raffaele S, Kamoun S (2012) Genome evolution in filamentous plant pathogens: why bigger can be better. Nature Reviews Microbiology, 10, 417–430. https://doi.org/10.1038/nrmicro2790 Spatafora JW, Aime MC, Grigoriev IV, Martin F, Stajich JE, Blackwell M (2017) The Fungal Tree of Life: from Molecular Systematics to Genome-Scale Phylogenies. Microbiology Spectrum, 5, 5.5.03. https://doi.org/10.1128/microbiolspec.FUNK-0053-2016 | Chromosomal rearrangements with stable repertoires of genes and transposable elements in an invasive forest-pathogenic fungus | Arthur Demene, Benoit Laurent, Sandrine Cros-Arteil, Christophe Boury, Cyril Dutech | <p style="text-align: justify;">Chromosomal rearrangements have been largely described among eukaryotes, and may have important consequences on evolution of species. High genome plasticity has been often reported in Fungi, which may explain their ... | ![]() | Evolutionary genomics, Fungi | Sebastien Duplessis | 2021-03-12 14:18:20 | View | |
07 Sep 2023
![]() The demographic history of the wild crop relative Brachypodium distachyon is shaped by distinct past and present ecological nichesNikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin https://doi.org/10.1101/2023.06.01.543285Natural variation and adaptation in Brachypodium distachyonRecommended by Josep Casacuberta based on reviews by Thibault Leroy and 1 anonymous reviewerIdentifying the genetic factors that allow plant adaptation is a major scientific question that is particularly relevant in the face of the climate change that we are already experiencing. To address this, it is essential to have genetic information on a high number of accessions (i.e., plants registered with unique accession numbers) growing under contrasting environmental conditions. There is already an important number of studies addressing these issues in the plant Arabidopsis thaliana, but there is a need to expand these analyses to species that play key roles in wild ecosystems and are close to very relevant crops, as is the case of grasses. The work of Minadakis, Roulin and co-workers (1) presents a Brachypodium distachyon panel of 332 fully sequences accessions that covers the whole species distribution across a wide range of bioclimatic conditions, which will be an invaluable tool to fill this gap. In addition, the authors use this data to start analyzing the population structure and demographic history of this plant, suggesting that the species experienced a shift of its distribution following the Last Glacial Maximum, which may have forced the species into new habitats. The authors also present a modeling of the niches occupied by B. distachyon together with an analysis of the genetic clades found in each of them, and start analyzing the different adaptive loci that may have allowed the species’ expansion into different bioclimatic areas. In addition to the importance of the resources made available by the authors for the scientific community, the analyses presented are well done and carefully discussed, and they highlight the potential of these new resources to investigate the genetic bases of plant adaptation. References 1. Nikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin. The demographic history of the wild crop relative Brachypodium distachyon is shaped by distinct past and present ecological niches. bioRxiv, 2023.06.01.543285, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2023.06.01.543285 | The demographic history of the wild crop relative *Brachypodium distachyon* is shaped by distinct past and present ecological niches | Nikolaos Minadakis, Hefin Williams, Robert Horvath, Danka Caković, Christoph Stritt, Michael Thieme, Yann Bourgeois, Anne C. Roulin | <p style="text-align: justify;">Closely related to economically important crops, the grass <em>Brachypodium distachyon</em> has been originally established as a pivotal species for grass genomics but more recently flourished as a model for develop... | ![]() | Evolutionary genomics, Functional genomics, Plants, Population genomics | Josep Casacuberta | 2023-06-14 15:28:30 | View | |
07 Feb 2023
RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomesFrederic Bertels, Julia von Irmer, Carsten Fortmann-Grote https://doi.org/10.1101/2022.05.22.493013A workflow for studying enigmatic non-autonomous transposable elements across bacteriaRecommended by Gavin DouglasRepetitive extragenic palindromic sequences (REPs) are common repetitive elements in bacterial genomes (Gilson et al., 1984; Stern et al., 1984). In 2011, Bertels and Rainey identified that REPs are overrepresented in pairs of inverted repeats, which likely form hairpin structures, that they referred to as “REP doublets forming hairpins” (REPINs). Based on bioinformatics analyses, they argued that REPINs are likely selfish elements that evolved from REPs flanking particular transposes (Bertels and Rainey, 2011). These transposases, so-called REP-associated tyrosine transposases (RAYTs), were known to be highly associated with the REP content in a genome and to have characteristic upstream and downstream flanking REPs (Nunvar et al., 2010). The flanking REPs likely enable RAYT transposition, and their horizontal replication is physically linked to this process. In contrast, Bertels and Rainey hypothesized that REPINs are selfish elements that are highly replicated due to the similarity in arrangement to these RAYT-flanking REPs, but independent of RAYT transposition and generally with no impact on bacterial fitness (Bertels and Rainey, 2011). This last point was especially contentious, as REPINs are highly conserved within species (Bertels and Rainey, 2023), which is unusual for non-beneficial bacterial DNA (Mira et al., 2001). Bertels and Rainey have since refined their argument to be that REPINs must provide benefits to host cells, but that there are nonetheless signatures of intragenomic conflict in genomes associated with these elements (Bertels and Rainey, 2023). These signatures reflect the divergent levels of selections driving REPIN distribution: selection at the level of each DNA element and selection on each individual bacterium. I found this observation particularly interesting as I and my colleague recently argued that these divergent levels of selection, and the interaction between them, is key to understanding bacterial pangenome diversity (Douglas and Shapiro, 2021). REPINs could be an excellent system for investigating these levels of selection across bacteria more generally. The problem is that REPINs have not been widely characterized in bacterial genomes, partially because no bioinformatic workflow has been available for this purpose. To address this problem, Fortmann-Grote et al. (2023) developed RAREFAN, which is a web server for identifying RAYTs and associated REPINs in a set of input genomes. The authors showcase their tool by applying it to 49 Stenotrophomonas maltophilia genomes and providing examples of how to identify and assess RAYT-REPIN hits. The workflow requires several manual steps, but nonetheless represents a straightforward and standardized approach. Overall, this workflow should enable RAYTs and REPINs to be identified across diverse bacterial species, which will facilitate further investigation into the mechanisms driving their maintenance and spread. References Bertels F, Rainey PB (2023) Ancient Darwinian replicators nested within eubacterial genomes. BioEssays, 45, 2200085. https://doi.org/10.1002/bies.202200085 Bertels F, Rainey PB (2011) Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria. PLOS Genetics, 7, e1002132. https://doi.org/10.1371/journal.pgen.1002132 Douglas GM, Shapiro BJ (2021) Genic Selection Within Prokaryotic Pangenomes. Genome Biology and Evolution, 13, evab234. https://doi.org/10.1093/gbe/evab234 Fortmann-Grote C, Irmer J von, Bertels F (2023) RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. bioRxiv, 2022.05.22.493013, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.05.22.493013 Gilson E, Clément J m., Brutlag D, Hofnung M (1984) A family of dispersed repetitive extragenic palindromic DNA sequences in E. coli. The EMBO Journal, 3, 1417–1421. https://doi.org/10.1002/j.1460-2075.1984.tb01986.x Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends in Genetics, 17, 589–596. https://doi.org/10.1016/S0168-9525(01)02447-7 Nunvar J, Huckova T, Licha I (2010) Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC Genomics, 11, 44. https://doi.org/10.1186/1471-2164-11-44 Stern MJ, Ames GF-L, Smith NH, Clare Robinson E, Higgins CF (1984) Repetitive extragenic palindromic sequences: A major component of the bacterial genome. Cell, 37, 1015–1026. https://doi.org/10.1016/0092-8674(84)90436-7 | RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes | Frederic Bertels, Julia von Irmer, Carsten Fortmann-Grote | <p style="text-align: justify;">Compared to eukaryotes, repetitive sequences are rare in bacterial genomes and usually do not persist for long. Yet, there is at least one class of persistent prokaryotic mobile genetic elements: REPINs. REPINs are ... | Bacteria and archaea, Bioinformatics, Evolutionary genomics, Viruses and transposable elements | Gavin Douglas | 2022-06-07 08:21:34 | View |
FOLLOW US
MANAGING BOARD
Gavin Douglas
Jean-François Flot
Danny Ionescu