Submit a preprint

Latest recommendations

IdTitle * Authors * Abstract * Picture * Thematic fields * RecommenderReviewersSubmission date
21 May 2025
article picture

Particular sequence characteristics induce bias in the detection of polymorphic transposable element insertions

A new simulation pipeline enhances benchmarking of transposon polymorphism detection tools

Recommended by based on reviews by Tianxiong Yu and 1 anonymous reviewer

Transposable Elements (TEs) are one of the main sources of genome variability. However, their study in populations has been hampered by the difficulty of properly detecting them using whole-genome re-sequencing data. Despite the expectations generated by the rise of long-read sequencing, today it is becoming clear that such technologies will not replace short-reads for analyzing large populations in the short term. Detecting Transposon Insertion Polymorphisms (TIPs) from short-read data is a challenging task, due to the repetitive nature of TE sequences that complicate read mapping. Nevertheless, accurate TIP detection is essential for understanding the evolutionary dynamics of TEs, their regulatory roles and their link with phenotypic variability. In the past 15 years, more than 20 tools have been developed for TIP detection using short-read data, but only a few independent benchmarks have been performed so far (Chen et al. 2023; Nelson et al. 2017; Rishishwar et al. 2017; Vendrell-Mir et al. 2019). Previous benchmarks have used simulated and real data to evaluate tool performance, each with its own set of advantages and limitations. In particular, introducing artificial insertions and simulating genomic short-reads may not reflect the nature of real TEs. By contrast, using real TE insertions as benchmarks can introduce bias since TE annotations are never perfect.

Verneret et al. (2025) introduce an original, alternative approach in which a comprehensive simulation method mimics the most important sequence features of real TEs and non-TE intergenic regions. This simulated data is then combined with true genic sequences, generating a pseudochromosome that can be used for benchmarking TIP detection pipelines. Using this approach, the authors eliminate the bias of TE annotation on real genomes, while preserving most of the characteristics of natural TEs. Using simulated pseudochromosomes for Drosophila melanogaster and Arabidopsis thaliana, Verneret et al. (2025) found that the performance of 14 commonly used TIP-calling tools is highly variable, with only a few performing well, and only at high sequencing depths. In addition to this, the authors analyzed the sequence features of true-positive and false-positive TIP calls, and found that specific TE sequence characteristics (e.g., length, age, etc.) affect the detection of both reference and non-reference TIPs.  

The approach described by Verneret et al. (2025) is an important contribution to the field for several reasons. On the one hand, the results shown in the publication will help the users of such tools make informed decisions before launching their experiments. For more advanced users, it will enable future benchmarks to identify which tools perform best for different species, each with their own sequence characteristics. For software developers, the data released constitutes a precious dataset to test their tools in the same conditions. Finally, the identification of sequence characteristics enriched among false positives and false negatives also gives an opportunity for developers to improve the performance of the new tools by considering these specificities.

                          

References

Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM (2023) Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mobile DNA, 14, 8. https://doi.org/10.1186/s13100-023-00296-4

Nelson MG, Linheiro RS, Bergman CM (2017) McClintock: An integrated pipeline for detecting transposable element insertions in whole-genome shotgun sequencing data. G3: Genes, Genomes, Genetics, 7, 2763–2778. https://doi.org/10.1534/g3.117.043893

Rishishwar L, Mariño-Ramírez L, Jordan IK (2017) Benchmarking computational tools for polymorphic transposable element detection. Briefings in Bioinformatics, 18, 908–918. https://doi.org/10.1093/bib/bbw072

Vendrell-Mir P, Barteri F, Merenciano M, González J, Casacuberta JM, Castanera R (2019) A benchmark of transposon insertion detection tools using real data. Mobile DNA, 10, 53. https://doi.org/10.1186/s13100-019-0197-9

Verneret M, Le VA, Faraut T, Turpin J, Lerat E (2025) Particular sequence characteristics induce bias in the detection of polymorphic transposable element insertions. bioRxiv, ver. 4 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2024.09.25.614865

 

Particular sequence characteristics induce bias in the detection of polymorphic transposable element insertionsMarie Verneret, Van Anthony Le, Thomas Faraut, Jocelyn Turpin, Emmanuelle Lerat<p>Transposable elements (TEs) have an important role in genome evolution but are challenging for bioinformatics detection due to their repetitive nature and ability to move and replicate within genomes. New sequencing technologies now enable the ...Bioinformatics, Evolutionary genomics, Population genomics, Viruses and transposable elementsRaúl Castanera2024-09-30 08:29:19 View
24 Feb 2023
article picture

Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study

How to interpret the inference of recombination landscapes on methods based on linkage disequilibrium?

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

Data interpretation depends on previously established and validated tools, designed for a specific type of data. These methods, however, are usually based on simple models with validity subject to a set of theoretical parameterized conditions and data types. Accordingly, the tool developers provide the potential users with guidelines for data interpretations within the tools’ limitation. Nevertheless, once the methodology is accepted by the community, it is employed in a large variety of empirical studies outside of the method’s original scope or that typically depart from the standard models used for its design, thus potentially leading to the wrong interpretation of the results.

Numerous empirical studies inferred recombination rates across genomes, detecting hotspots of recombination and comparing related species (e.g., Shanfelter et al. 2019, Spence and Song 2019). These studies used indirect methodologies based on the signals that recombination left in the genome, such as linkage disequilibrium and the patterns of haplotype segregation (e.g.,Chan et al. 2012). The conclusions from these analyses have been used, for example, to interpret the evolution of the chromosomal structure or the evolution of recombination among closely related species.

Indirect methods have the advantage of collecting a large quantity of recombination events, and thus have a better resolution than direct methods (which only detect the few recombination events occurring at that time). On the other hand, indirect methods are affected by many different evolutionary events, such as demographic changes and selection. Indeed, the inference of recombination levels across the genome has not been studied accurately in non-standard conditions. Linkage disequilibrium is affected by several factors that can modify the recombination inference, such as demographic history, events of selection, population size, and mutation rate, but is also related to the size of the studied sample, and other technical parameters defined for each specific methodology.

Raynaud et al (2023) analyzed the reliability of the recombination rate inference when considering the violation of several standard assumptions (evolutionary and methodological) in one of the most popular families of methods based on LDhat (McVean et al. 2004), specifically its improved version, LDhelmet (Chan et al. 2012). These methods cover around 70 % of the studies that infer recombination rates. The authors used recombination maps, obtained from empirical studies on humans, and included hotspots, to perform a detailed simulation study of the capacity of this methodology to correctly infer the pattern of recombination and the location of these hotspots. Correlations between the real, and inferred values from simulations were obtained, as well as several rates, such as the true positive and false discovery rate to detect hotspots.

The authors of this work send a message of caution to researchers that are applying this methodology to interpret data from the inference of recombination landscapes and the location of hotspots. The inference of recombination landscapes and hotspots can differ considerably even in standard model conditions. In addition, demographic processes, like bottleneck or admixture, but also the level of population size and mutation rates, can substantially affect the estimation accuracy of the level of recombination and the location of hotspots. Indeed, the inference of the location of hotspots in simulated data with the same landscape, can be very imprecise when standard assumptions are violated or not considered. These effects may lead to incorrect interpretations, for example about the conservation of recombination maps between closely related species. Finally, Raynaud et al (2023) included a useful guide with advice on how to obtain accurate recombination estimations with methods based on linkage disequilibrium, also emphasizing the limitations of such approaches.

REFERENCES

Chan AH, Jenkins PA, Song YS (2012) Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster. PLOS Genetics, 8, e1003090. https://doi.org/10.1371/journal.pgen.1003090

McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The Fine-Scale Structure of Recombination Rate Variation in the Human Genome. Science, 304, 581–584. https://doi.org/10.1126/science.1092500

Raynaud M, Gagnaire P-A, Galtier N (2023) Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study. bioRxiv, 2022.03.30.486352, ver. 2 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.03.30.486352

Spence JP, Song YS (2019) Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Science Advances, 5, eaaw9206. https://doi.org/10.1126/sciadv.aaw9206

Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation studyMarie Raynaud, Pierre-Alexandre Gagnaire, Nicolas Galtier<p style="text-align: justify;">Knowledge of recombination rate variation along the genome provides important insights into genome and phenotypic evolution. Population genomic approaches offer an attractive way to infer the population-scaled recom...Bioinformatics, Evolutionary genomics, Population genomicsSebastian Ernesto Ramos-Onsins2022-04-05 14:59:14 View
05 Aug 2024
article picture

LukProt: A database of eukaryotic predicted proteins designed for investigations of animal origins

A protein database to study the origin of metazoans

Recommended by ORCID_LOGO based on reviews by Giacomo Mutti and 2 anonymous reviewers

Sobala (2024) introduces a new, comprehensive, and curated eukaryotic database. It consolidates information from EukProt (Richter et al. 2022) and various other resources to enhance Metazoa representation in existing protein databases. The preprint is of significant interest to the phylogenomics and comparative genomics communities, and I commend the author for their work.

LukProt, the expanded database, significantly increases the taxon sampling within holozoans. It integrates data from the previously assembled EukProt and AniProtDB (Barreira et al. 2021) databases, with additional datasets from early-diverging animal lineages such as ctenophores, sponges, and cnidarians. This effort will undoubtedly be useful for researchers investigating these clades and their origins, as well as for the broader field of comparative genomics.

The author provides both web-portal and command-line versions of the database, making it accessible to users with varying degrees of bioinformatic proficiency. The curation effort is commendable, and I believe the comparative genomics community, especially those interested in animal origins, will find LukProt to be a valuable resource.

           

References

Barreira SN, Nguyen A-D, Fredriksen MT, Wolfsberg TG, Moreland RT, Baxevanis AD (2021) AniProtDB: A collection of consistently generated metazoan proteomes for comparative genomics studies. Molecular Biology and Evolution 38, 4628–4633. https://doi.org/10.1093/molbev/msab165

Richter DJ, Berney C, Strassert JFH, Poh Y-P, Herman EK, Muñoz-Gómez SA, Wideman JG, Burki F, de Vargas C (2022) EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes. Peer Community Journal 2, e56. https://doi.org/10.24072/pcjournal.173

Sobala ŁF (2024) LukProt: A database of eukaryotic predicted proteins designed for investigations of animal origins. bioRxiv, ver. 2 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2024.01.30.577650

LukProt: A database of eukaryotic predicted proteins designed for investigations of animal originsŁukasz F. Sobala<p>The origins and early evolution of animals is a subject with many outstanding questions. One problem faced by researchers trying to answer them is the absence of a comprehensive database of sequences from non-bilaterians. Publicly available dat...Bioinformatics, Evolutionary genomics, Marine invertebratesJavier del CampoAnonymous, Giacomo Mutti , Anonymous2024-02-02 13:04:31 View
02 Apr 2021
article picture

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection

Toward a critical assessment of virus detection in plants

Recommended by based on reviews by Alexander Suh and 1 anonymous reviewer

The advent of High Throughput Sequencing (HTS) since the last decade has revealed previously unsuspected diversity of viruses as well as their (sometimes) unexpected presence in some healthy individuals. These results demonstrate that genomics offers a powerful tool for studying viruses at the individual level, allowing an in-depth inventory of those that are infecting an organism. Such approaches make it possible to study viromes with an unprecedented level of detail, both qualitative and quantitative, which opens new venues for analyses of viruses of humans, animals and plants. Consequently, the diagnostic field is using more and more HTS, fueling the need for efficient and reliable bioinformatics tools. 

Many such tools have already been developed, but in plant disease diagnostics, validation of the bioinformatics pipelines used for the detection of viruses in HTS datasets is still in its infancy. There is an urgent need for benchmarking the different tools and algorithms using well-designed reference datasets generated for this purpose. This is a crucial step to move forward and to improve existing solutions toward well-standardized bioinformatics protocols. This context has led to the creation of the Plant Health Bioinformatics Network (PHBN), a Euphresco network project aiming to build a bioinformatics community working on plant health. One of their objectives is to provide researchers with open-access reference datasets allowing to compare and validate virus detection pipelines. 

In this framework, Tamisier et al. [1] present real, semi-artificial, and completely artificial datasets, each aimed at addressing challenges that could affect virus detection. These datasets comprise real RNA-seq reads from virus-infected plants as well as simulated virus reads. Such a work, providing open-access datasets for benchmarking bioinformatics tools, should be encouraged as they are key to software improvement as demonstrated by the well-known success story of the protein structure prediction community: their pioneer community-wide effort, called Critical Assessment of protein Structure Prediction (CASP)[2], has been providing research groups since 1994 with an invaluable way to objectively test their structure prediction methods, thereby delivering an independent assessment of state-of-art protein-structure modelling tools. Following this success, many other bioinformatic community developed similar “competitions”, such as RNA-puzzles [3] to predict RNA structures, Critical Assessment of Function Annotation [4] to predict gene functions, Critical Assessment of Prediction of Interactions [5] to predict protein-protein interactions, Assemblathon [6] for genome assembly, etc. These are just a few examples from a long list of successful initiatives. Such efforts enable rigorous assessments of tools, stimulate the developers’ creativity, but also provide user communities with a state-of-art evaluation of available tools.

Inspired by these success stories, the authors propose a “VIROMOCK challenge” [7], asking researchers in the field to test their tools and to provide feedback on each dataset through a repository. This initiative, if well followed, will undoubtedly improve the field of virus detection in plants, but also probably in many other organisms. This will be a major contribution to the field of viruses, leading to better diagnostics and, consequently, a better understanding of viral diseases, thus participating in promoting human, animal and plant health.   

References

[1] Tamisier, L., Haegeman, A., Foucart, Y., Fouillien, N., Al Rwahnih, M., Buzkan, N., Candresse, T., Chiumenti, M., De Jonghe, K., Lefebvre, M., Margaria, P., Reynard, J.-S., Stevens, K., Kutnjak, D. and Massart, S. (2021) Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection. Zenodo, 4273791, version 4 peer-reviewed and recommended by Peer community in Genomics. doi: https://doi.org/10.5281/zenodo.4273791

[2] Critical Assessment of protein Structure Prediction” (CASP) - https://en.wikipedia.org/wiki/CASP

[3] RNA-puzzles - https://www.rnapuzzles.org

[4] Critical Assessment of Function Annotation (CAFA) - https://en.wikipedia.org/wiki/Critical_Assessment_of_Function_Annotation

[5] Critical Assessment of Prediction of Interactions (CAPI) - https://en.wikipedia.org/wiki/Critical_Assessment_of_Prediction_of_Interactions

[6] Assemblathon - https://assemblathon.org

[7] VIROMOCK challenge - https://gitlab.com/ilvo/VIROMOCKchallenge

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detectionLucie Tamisier, Annelies Haegeman, Yoika Foucart, Nicolas Fouillien, Maher Al Rwahnih, Nihal Buzkan, Thierry Candresse, Michela Chiumenti, Kris De Jonghe, Marie Lefebvre, Paolo Margaria, Jean Sébastien Reynard, Kristian Stevens, Denis Kutnjak, Séb...<p>The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics...Bioinformatics, Plants, Viruses and transposable elementsHadi Quesneville2020-11-27 14:31:47 View
23 Aug 2022
article picture

A novel lineage of the Capra genus discovered in the Taurus Mountains of Turkey using ancient genomics

Goat ancient DNA analysis unveils a new lineage that may have hybridized with domestic goats

Recommended by based on reviews by Torsten Günther and 1 anonymous reviewer

The genomic analysis of ancient remains has revolutionized the study of the past over the last decade. On top of the discoveries related to human evolution, plant and animal archaeogenomics has been used to gain new insights into the domestication process and the dispersal of domestic forms.

In this study, Daly and colleagues analyse the genomic data from seven goat specimens from the Epipalaeolithic recovered from the Direkli Cave in the Taurus Mountains in southern Turkey. They also generate new genomic data from Capra lineages across the phylogeny, contributing to the availability of genomic resources for this genus. Analysis of the ancient remains is compared to modern genomic variability and sheds light on the complexity of the Tur wild Capra lineages and their relationship with domestic goats and their wild ancestors.

Authors find that during the Late Pleistocene in the Taurus Mountains wild goats from the Tur lineage, today restricted to the Caucasus region, were not rare and cohabited with Bezoar, the wild goats that are the ancestors of domestic goats. They identify the Direkli Cave specimens as a lineage separate from the 
West and East Caucasus Tur modern lineages. Also, analysis of the genomic data and mitochondrial haplotypes reveals hybridization between the Tur and the Bezoar wild lineages. Interestingly, authors also find an uneven amount of Tur ancestry among Neolithic domestic goats, with European domestic goats showing evidence of this ancient Tur ancestry, whereas Neolithic Iranian domestic goats do not, a pattern that is also observed in some modern European domestic goats.

A modified D statistic, Dex, is developed to examine the contribution of the ancient Tur lineage in domestic goats through time and space. Dex measures the relative degree of allele sharing, derived specifically in a selected genome or group of genomes, and may have some utility in genera with complex admixture histories or admixture from ghost lineages. Results confirm that Neolithic European goat had an excess of allele sharing with this ancient Tur lineage, something that is absent in contemporary goats eastwards or in modern goats.

Interspecific gene flow is not uncommon among mammals, but the case of Capra has the additional motivation of understanding the origins of the domestic species. This work uncovers an ancient Tur lineage that is different from the modern ones and is additionally found in another geographic area. Furthermore, evidence shows that this ancient lineage exhibits substantial amounts of allele sharing with the wild ancestor of the domestic goat, but also with the Neolithic Eurasian domestic goats, highlighting the complexity of the domestication process.

This work has also important implications in understanding the effect of over-hunting and habitat disruption during the Anthropocene on the evolution of the Capra genus. The availability of more ancient specimens and better coverage of the modern genomic variability can help quantifying the lineages that went lost and identify the causes of their extinction.

This work is limited by the current availability of whole genomes from modern Capra specimens, but pieces of evidence as well that an effort is needed to obtain more genomic data from ancient goats from different geographic ranges to determine to what extent these lineages contributed to goat domestication.

References

Daly KG, Arbuckle BS, Rossi C, Mattiangeli V, Lawlor PA, Mashkour M, Sauer E, Lesur J, Atici L, Cevdet CM and Bradley DG (2022) A novel lineage of the Capra genus discovered in the Taurus Mountains of Turkey using ancient genomics. bioRxiv, 2022.04.08.487619, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.04.08.487619

A novel lineage of the Capra genus discovered in the Taurus Mountains of Turkey using ancient genomicsKevin G. Daly, Benjamin S. Arbuckle, Conor Rossi, Valeria Mattiangeli, Phoebe A. Lawlor, Marjan Mashkour, Eberhard Sauer, Joséphine Lesur, Levent Atici, Cevdet Merih Erek, Daniel G. Bradley<p>Direkli Cave, located in the Taurus Mountains of southern Turkey, was occupied by Late Epipaleolithic hunters-gatherers for the seasonal hunting and processing of game including large numbers of wild goats. We report genomic data from new and p...Evolutionary genomics, Population genomics, VertebratesLaura Botigué2022-04-15 12:05:47 View
18 Jul 2022
article picture

CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes

A flexible and reproducible pipeline for long-read assembly and evaluation

Recommended by based on reviews by Benjamin Istace and Valentine Murigneux

Third-generation sequencing has revolutionised de novo genome assembly. Thanks to this technology, genome reference sequences have evolved from fragmented drafts to gapless, telomere-to-telomere genome assemblies. Long reads produced by Oxford Nanopore and PacBio technologies can span structural variants and resolve complex repetitive regions such as centromeres, unlocking previously inaccessible genomic information. Nowadays, many research groups can afford to sequence the genome of their working model using long reads. Nevertheless, genome assembly poses a significant computational challenge. Read length, quality, coverage and genomic features such as repeat content can affect assembly contiguity, accuracy, and completeness in almost unpredictable ways. Consequently, there is no best universal software or protocol for this task. Producing a high-quality assembly requires chaining several tools into pipelines and performing extensive comparisons between the assemblies obtained by different tool combinations to decide which one is the best. This task can be extremely challenging, as the number of tools available rises very rapidly, and thorough benchmarks cannot be updated and published at such a fast pace. 

In their paper, Orjuela and collaborators present CulebrONT [1], a universal pipeline that greatly contributes to overcoming these challenges and facilitates long-read genome assembly for all taxonomic groups. CulebrONT incorporates six commonly used assemblers and allows to perform assembly, circularization (if needed), polishing, and evaluation in a simple framework. One important aspect of CulebrONT is its modularity, which allows the activation or deactivation of specific tools, giving great flexibility to the user. Nevertheless, possibly the best feature of CulebrONT is the opportunity to benchmark the selected tool combinations based on the excellent report generated by the pipeline. This HTML report aggregates the output of several tools for quality evaluation of the assemblies (e.g. BUSCO [2] or QUAST [3]) generated by the different assemblers, in addition to the running time and configuration parameters. Such information is of great help to identify the best-suited pipeline, as exemplified by the authors using four datasets of different taxonomic origins. Finally, CulebrONT can handle multiple samples in parallel, which makes it a good solution for laboratories looking for multiple assemblies on a large scale. 

References

1. Orjuela J, Comte A, Ravel S, Charriat F, Vi T, Sabot F, Cunnac S (2022) CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes. bioRxiv, 2021.07.19.452922, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2021.07.19.452922

2. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31, 3210–3212. https://doi.org/10.1093/bioinformatics/btv351

3. Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29, 1072–1075. https://doi.org/10.1093/bioinformatics/btt086

CulebrONT: a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomesJulie Orjuela, Aurore Comte, Sébastien Ravel, Florian Charriat, Tram Vi, Francois Sabot, Sébastien Cunnac<p style="text-align: justify;">Using long reads provides higher contiguity and better genome assemblies. However, producing such high quality sequences from raw reads requires to chain a growing set of tools, and determining the best workflow is ...BioinformaticsRaúl Castanera Valentine Murigneux2022-02-22 16:21:25 View
26 Jun 2024
article picture

Transposable element expression with variation in sex chromosome number supports a toxic Y effect on human longevity

The number of Y chromosomes is positively associated with transposable element expression in humans, in line with the toxic Y hypothesis

Recommended by ORCID_LOGO based on reviews by 3 anonymous reviewers

The study of human longevity has long been a source of fascination for scientists, particularly in relation to the genetic factors that contribute to differences in lifespan between the sexes. One particularly intriguing area of research concerns the Y chromosome and its impact on male longevity. The Y chromosome expresses genes that are essential for male development and reproduction. However, it may also influence various physiological processes and health outcomes. It is therefore of great importance to investigate the impact of the Y chromosome on longevity. This may assist in elucidating the biological mechanisms underlying sex-specific differences in aging and disease susceptibility. As longevity research progresses, the Y chromosome's role presents a promising avenue for elucidating the complex interplay between genetics and aging.

Transposable elements (TEs), often referred to as "jumping genes", are DNA sequences that can move within the genome, potentially causing mutations and genomic instability. In young, healthy cells, various mechanisms, including DNA methylation and histone modifications, suppress TE activity to maintain genomic integrity. However, as individuals age, these regulatory mechanisms may deteriorate, leading to increased TE activity. This dysregulation could contribute to age-related genomic instability, cellular dysfunction, and the onset of diseases such as cancer. Understanding how TE repression changes with age is crucial for uncovering the molecular underpinnings of aging (De Cecco et al. 2013; Van Meter et al. 2014).

The lower recombination rates observed on Y chromosomes result in the accumulation of TE insertions, which in turn leads to an enrichment of TEs and potentially higher TE activity. To ascertain whether the number of Y chromosomes is associated with TE activity in humans, Teoli et al. (2024) studied the TE expression level, as a proxy of the TE activity, in several karyotype compositions (i.e. with differing numbers of Y chromosomes). They used transcriptomic data from blood samples collected in 24 individuals (six females 46,XX, six males 46,XY, eight males 47,XXY and four males 47,XYY). Even though they did not observe a significant correlation between the number of Y chromosomes and TE expression, their results suggest an impact of the presence of the Y chromosome on the overall TE expression. The presence of Y chromosomes also affected the type (family) of TE present/expressed. To ensure that the TE expression level was not biased by the expression of a gene in proximity due to intron retention or pervasive intragenic transcription, the authors also tested whether the TE expression variation observed between the different karyotypes could be explained by gene (i.e. here non-TE gene) expression. 

As TE repression mechanisms are known to decrease over time, the authors also tested whether TE repression is weaker in older individuals, which would support a compelling link between genomic stability and aging. They investigated the TE expression differently between males and females, hypothesizing that old males should exhibit a stronger TE activity than old females. Using selected 45 males (47,XY) and 35 females (46,XX) blood samples of various ages (from 20 to 70) from the Genotype-Tissue Expression (GTEx) project, the authors studied the effect of age on TE expression using 10-year range to group the study subjects. Based on these data, they fail to find an overall increase of TE expression in old males compared to old females.

Notwithstanding the small number of samples, the study is well-designed and innovative, and its findings are highly promising. It marks an initial step towards understanding the impact of Y-chromosome ‘toxicity’ on human longevity. Despite the relatively small sample size, which is a consequence of the difficulty of obtaining samples from individuals with sex chromosome aneuploidies, the results are highly intriguing and will be of interest to a broad range of biologists.

                                             

References

De Cecco M, Criscione SW, Peckham EJ, Hillenmeyer S, Hamm EA, Manivannan J, Peterson AL, Kreiling JA, Neretti N, Sedivy JM (2013) Genomes of replicatively senescent cells undergo global epigenetic changes leading to gene silencing and activation of transposable elements. Aging Cell, 12, 247–256. https://doi.org/10.1111/acel.12047

Teoli J, Merenciano M, Fablet M, Necsulea A, Siqueira-de-Oliveira D, Brandulas-Cammarata A, Labalme A, Lejeune H, Lemaitre J-F, Gueyffier F,  Sanlaville D, Bardel C, Vieira C, Marais GAB, Plotton I (2024) Transposable element expression with variation in sex chromosome number supports a toxic Y effect on human longevity. bioRxiv, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2023.08.03.550779

Van Meter M, Kashyap M, Rezazadeh S, Geneva AJ, Morello TD, Seluanov A,  Gorbunova V (2014) SIRT6 represses LINE1 retrotransposons by ribosylating KAP1 but this repression fails with stress and age. Nature Communications, 5, 5011. https://doi.org/10.1038/ncomms6011

 

 

Transposable element expression with variation in sex chromosome number supports a toxic Y effect on human longevityJordan Teoli, Miriam Merenciano, Marie Fablet, Anamaria Necsulea, Daniel Siqueira-de-Oliveira, Alessandro Brandulas-Cammarata, Audrey Labalme, Hervé Lejeune, Jean-François Lemaitre, François Gueyffier, Damien Sanlaville, Claire Bardel, Cristina Vi...<p>Why women live longer than men is still an open question in human biology. Sex chromosomes have been proposed to play a role in the observed sex gap in longevity, and the Y male chromosome has been suspected of having a potential toxic genomic ...Evolutionary genomicsAnna-Sophie Fiston-LavierAnonymous, Igor Rogozin , Paul Jay , Anonymous2023-08-18 15:01:38 View
20 Nov 2023
article picture

Building a Portuguese Coalition for Biodiversity Genomics

The Portuguese genomics community teams up with iconic species to understand the destruction of biodiversity

Recommended by based on reviews by Svein-Ole Mikalsen and 1 anonymous reviewer

This manuscript describes the ongoing work and plans of Biogenome Portugal: a new network of researchers in the Portuguese biodiversity genomics community. The aims of this network are to jointly train scientists in ecology and evolution, generate new knowledge and understanding of Portuguese biodiversity, and better engage with the public and with international researchers, so as to advance conservation efforts in the region. In collaboration across disciplines and institutions, they are also contributing to the European Reference Genome Atlas (ERGA): a massive scientific effort, seeking to eventually produce reference-quality genomes for all species in the European continent (Mc Cartney et al. 2023).

The manuscript centers around six iconic and/or severely threatened species, whose range extends across parts of what is today considered Portuguese territory. Via the Portugal chapter of ERGA (ERGA-Portugal), the researchers will generate high-quality genome sequences from these species. The species are the Iberian hare, the Azores laurel, the Black wheatear, the Portuguese crowberry, the Cave ground beetle and the Iberian minnowcarp. In ignorance of human-made political borders, some of these species also occupy large parts of the rest of the Iberian peninsula, highlighting the importance of transnational collaboration in biodiversity efforts. The researchers extracted samples from members of each of these species, and are building reference genome sequences from them. In some cases, these sequences will also be co-analyzed with additional population genomic data from the same species or genetic data from cohabiting species. The researchers aim to answer a variety of ecological and evolutionary questions using this information, including how genetic diversity is being affected by the destruction of their habitat, and how they are being forced to adapt as a consequence of the climate emergency.

The authors did a very good job in providing a justification for the choice of pilot species, a thorough methodological overview of current work, and well thought-out plans for future analyses once the genome sequences are available for study. The authors also describe plans for networking and training activities to foster a well-connected Portuguese biodiversity genomics community.

Applying a genomic analysis lens is important for understanding the ever faster process of devastation of our natural world. Governments and corporations around the globe are destroying nature at ever larger scales (Diaz et al. 2019). They are also destabilizing the climatic conditions on which life has existed for thousands of years (Trisos et al. 2020). Thus, genetic diversity is decreasing faster than ever in human history, even when it comes to non-threatened species (Exposito-Alonso et al. 2022), and these decreases are disrupting ecological processes worldwide (Richardson et al. 2023). This, in turn, is threatening the conditions on which the stability of our societies rest (Gardner and Bullock 2021). The efforts of Biogenome Portal and ERGA-Portugal will go a long way in helping us understand in greater detail how this process is unfolding in Portuguese territories.

 

 

References

Díaz, Sandra, et al. "Pervasive human-driven decline of life on Earth points to the need for transformative change." Science 366.6471 (2019): eaax3100. https://doi.org/10.1126/science.aax3100

Exposito-Alonso, Moises, et al. "Genetic diversity loss in the Anthropocene." Science 377.6613 (2022): 1431-1435. https://doi.org/10.1126/science.abn5642

Gardner, Charlie J., and James M. Bullock. "In the climate emergency, conservation must become survival ecology." Frontiers in Conservation Science 2 (2021): 659912. https://doi.org/10.3389/fcosc.2021.659912

Mc Cartney, Ann M., et al. "The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics." bioRxiv (2023): 2023-09, ver. 2 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.32942/X20W3Q

Richardson, Katherine, et al. "Earth beyond six of nine planetary boundaries." Science Advances 9.37 (2023): eadh2458. https://doi.org/10.1126/sciadv.adh2458

Trisos, Christopher H., Cory Merow, and Alex L. Pigot. "The projected timing of abrupt ecological disruption from climate change." Nature 580.7804 (2020): 496-501. https://doi.org/10.1038/s41586-020-2189-9

Building a Portuguese Coalition for Biodiversity GenomicsJoão Pedro Marques, Paulo Célio Alves, Isabel R. Amorim, Ricardo J. Lopes, Mónica Moura, Gene Meyers, Manuela Sim-Sim, Carla Sousa-Santos, Maria Judite Alves, Paulo AV Borges, Thomas Brown, Miguel Carneiro, Carlos Carrapato, Luís Ceríaco, Claudio ...<p style="text-align: justify;">The diverse physiography of the Portuguese land and marine territory, spanning from continental Europe to the Atlantic archipelagos, has made it an important repository of biodiversity throughout the Pleistocene gla...ERGA, ERGA PilotFernando Racimo2023-07-14 11:24:22 View
12 Jul 2022
article picture

Chromosome-level genome assembly and annotation of two lineages of the ant Cataglyphis hispanica: steppingstones towards genomic studies of hybridogenesis and thermal adaptation in desert ants

A genomic resource for ants, and more

Recommended by based on reviews by Isabel Almudi and Nicolas Nègre

The ant species Cataglyphis hispanica is remarkably well adapted to arid habitats of the Iberian Peninsula where two hybridogenetic lineages co-occur, i.e., queens mating with males from the other lineage produce only non-reproductive hybrid workers whereas reproductive males and females are produced by parthenogenesis (Lavanchy and Schwander, 2019). For these two reasons, the genomes of these lineages, Chis1 and Chis2, are potential gold mines to explore the genetic bases of thermal adaptation and the evolution of alternative reproductive modes.

Nowadays, sequencing technology enables assembling all kinds of genomes provided genomic DNA can be extracted. More difficult to achieve is high-quality assemblies with just as high-quality annotations that are readily available to the community to be used and re-used at will (Byrne et al., 2019; Salzberg, 2019). The challenge was successfully completed by Darras and colleagues, the generated resource being fully available to the community, including scripts and command lines used to obtain the proposed results.

The authors particularly describe that lineage Chis2 has 27 chromosomes, against 26 or 27 for lineage Chis1, with a Robertsonian translocation identified by chromosome conformation capture (Duan et al., 2010, 2012) in the two Queens sequenced. Transcript-supported gene annotation provided 11,290 high-quality gene models. In addition, an ant-tailored annotation pipeline identified 56 different families of repetitive elements in both Chis1 and Chis2 lineages of C. hispanica spread in a little over 15 % of the genome. Altogether, the genomes of Chis1 and Chis2 are highly similar and syntenic, with some level of polymorphism raising questions about their evolutionary story timeline. In particular, the uniform distribution of polymorphisms along the genomes shakes up a previous hypothesis of hybridogenetic lineage pairs determined by ancient non-recombining regions (Linksvayer, Busch and Smith, 2013).

I recommend this paper because the science behind is both solid and well-explained. The provided resource is of high quality, and accompanied by a critical exploration of the perspectives brought by the results. These genomes are excellent resources to now go further in exploring the possible events at the genome level that accompanied the remarkable thermal adaptation of the ants Cataglyphis, as well as insights into the genetics of hybridogenetic lineages.

Beyond the scientific value of the resources and insights provided by the work performed, I also recommend this article because it is an excellent example of Open Science (Allen and Mehler, 2019; Sarabipour et al., 2019), all data methods and tools being fully and easily accessible to whoever wants/needs it. 

References

Allen C, Mehler DMA (2019) Open science challenges, benefits and tips in early career and beyond. PLOS Biology, 17, e3000246. https://doi.org/10.1371/journal.pbio.3000246

Byrne A, Cole C, Volden R, Vollmers C (2019) Realizing the potential of full-length transcriptome sequencing. Philosophical Transactions of the Royal Society B: Biological Sciences, 374, 20190097. https://doi.org/10.1098/rstb.2019.0097

Darras H, de Souza Araujo N, Baudry L, Guiglielmoni N, Lorite P, Marbouty M, Rodriguez F, Arkhipova I, Koszul R, Flot J-F, Aron S (2022) Chromosome-level genome assembly and annotation of two lineages of the ant Cataglyphis hispanica: stepping stones towards genomic studies of hybridogenesis and thermal adaptation in desert ants. bioRxiv, 2022.01.07.475286, ver. 3 peer-reviewed and recommended by Peer community in Genomics. https://doi.org/10.1101/2022.01.07.475286

Duan Z, Andronescu M, Schutz K, Lee C, Shendure J, Fields S, Noble WS, Anthony Blau C (2012) A genome-wide 3C-method for characterizing the three-dimensional architectures of genomes. Methods, 58, 277–288. https://doi.org/10.1016/j.ymeth.2012.06.018

Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, Lee C, Shendure J, Fields S, Blau CA, Noble WS (2010) A three-dimensional model of the yeast genome. Nature, 465, 363–367. https://doi.org/10.1038/nature08973

Lavanchy G, Schwander T (2019) Hybridogenesis. Current Biology, 29, R9–R11. https://doi.org/10.1016/j.cub.2018.11.046

Linksvayer TA, Busch JW, Smith CR (2013) Social supergenes of superorganisms: Do supergenes play important roles in social evolution? BioEssays, 35, 683–689. https://doi.org/10.1002/bies.201300038

Salzberg SL (2019) Next-generation genome annotation: we still struggle to get it right. Genome Biology, 20, 92. https://doi.org/10.1186/s13059-019-1715-2

Sarabipour S, Debat HJ, Emmott E, Burgess SJ, Schwessinger B, Hensel Z (2019) On the value of preprints: An early career researcher perspective. PLOS Biology, 17, e3000151. https://doi.org/10.1371/journal.pbio.3000151

Chromosome-level genome assembly and annotation of two lineages of the ant Cataglyphis hispanica: steppingstones towards genomic studies of hybridogenesis and thermal adaptation in desert antsHugo Darras, Natalia de Souza Araujo, Lyam Baudry, Nadège Guiglielmoni, Pedro Lorite, Martial Marbouty, Fernando Rodriguez, Irina Arkhipova, Romain Koszul, Jean-François Flot, Serge Aron<p style="text-align: justify;"><em>Cataglyphis</em> are thermophilic ants that forage during the day when temperatures are highest and sometimes close to their critical thermal limit. Several Cataglyphis species have evolved unusual reproductive ...Evolutionary genomicsNadia Ponts Nicolas Nègre, Isabel Almudi2022-01-13 16:47:30 View
30 Apr 2025
article picture

Natural variation in chalcone isomerase defines a major locus controlling radial stem growth variation among Populus nigra populations

Advancing our understanding of poplar growth using a multi-omics approach

Recommended by based on reviews by Gancho Slavov and 1 anonymous reviewer

Poplar is a promising resource, valued not only for wood production and the development of lignocellulosic biomass, but also for its potential role in carbon sequestration. Recognizing the importance of stem growth for wood production and biomass development, Duruflé et al. (2025) present a comprehensive study on the genetic basis of radial stem growth variation in natural populations of black poplar (Populus nigra). They employed a systems biology approach to identify the quantitative trait loci (QTLs) underlying this trait, integrating genomic, transcriptomic, and phenotypic data from a large collection of poplar genotypes. Their genome-wide association study (GWAS) analysis identified single nucleotide polymorphisms linked to two gene models predicted to encode chalcone isomerase, an enzyme involved in the flavonoid pathway. The authors then used the RNA-seq data to test whether the expression of the candidate genes correlated with the phenotypes, and indeed the level of expression of both genes displayed a correlation to the stem circumference. To support their findings, the authors compared the location of the QTLs detected in this study with previously published QTLs. Interestingly, they found a previously reported QTL co-localizing with the newly identified one. The authors have addressed the concerns raised by reviewers on the GWAS analysis and discussed the complication of this QTL study in the manuscript. 

In essence, the authors have combined the power of GWAS and transcriptomics to locate candidate genes and applied population genetics to explore the evolutionary context of the identified gene. This comprehensive approach provides strong evidence for the role of chalcone isomerase in controlling radial stem growth variation in black poplar. The study opens up avenues for further research into the precise mechanisms by which chalcone isomerase and flavonoid metabolism influence stem growth and provides useful information for future poplar breeding programs.

                    

References

Duruflé H, Déjardin A, Jorge V, Pégard M, Pilate G, Rogier O, Sanchez L, Segura V (2025) Natural variation in chalcone isomerase defines a major locus controlling radial stem growth variation among Populus nigra populations. bioRxiv, ver. 3 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2024.10.21.618920

Natural variation in chalcone isomerase defines a major locus controlling radial stem growth variation among *Populus nigra* populationsHarold Durufle, Annabelle Dejardin, Veronique Jorge, Marie Pegard, Gilles Pilate, Odile Rogier, Leopoldo Sanchez, Vincent Segura<p>Poplar is a promising resource for wood production and the development of lignocellulosic biomass, but currently available varieties have not been optimized for these purposes. Therefore, it is critical to investigate the genetic variability an...Plants, Population genomicsWirulda Pootakham Fernando Guerra , Gancho Slavov 2024-10-25 09:37:10 View