Submit a preprint

Latest recommendations

IdTitle * Authors * Abstract * Picture * Thematic fields * RecommenderReviewersSubmission date
14 Jan 2025
article picture

Chromosome-level reference genome assembly for the mountain hare (Lepus timidus)

The genomic foundations of adaptation: evaluating the mountain hare

Recommended by ORCID_LOGO based on reviews by Theodore Squires and 1 anonymous reviewer

Fekete et al. (2024) generated a chromosome-level reference genome assembly for the mountain hare (Lepus timidus). This represents a significant advancement in genomic research for non-model organisms, achieving high quality through advanced sequencing and curation techniques. This achievement serves as a foundational blueprint for future efforts in other species, particularly those with ecological or evolutionary importance. The assembly has high continuity and completeness, with an N50 scaffold length of 125.8 Mb and a contig N50 of 4.9 Mb, meeting the Earth BioGenome Project's stringent criteria for reference-grade genomes (Mc Cartney et al., 2024). The combination of PacBio HiFi sequencing and Hi-C scaffolding techniques enabled robust assembly and chromosomal scaffolding of all 23 autosomes and the X and Y sex chromosomes. Additionally, manual curation enhanced the assembly quality, accurately representing genomic sequences. Although the genome provides valuable structural insights, the limited functional annotations highlight a need for further investigation into the genetic underpinnings of the ecological and adaptive traits of the mountain hare.

The ecological and evolutionary implications of resolving this genome are considerable, particularly given the mountain hare’s adaptations to cold, snowy environments and its role in boreal ecosystems. The assembly facilitates the study of adaptations, such as camouflage and snowshoe-like feet, which are critical for survival in its rapidly changing habitat. Comparative genomic analyses reveal the evolutionary relationship between Lepus timidus and closely related species, such as the brown hare (L. europaeus) and Irish hare (L. t. hibernicus), providing insights into gene flow, hybridization, and speciation. These findings have practical implications for conservation genetics, particularly for subspecies threatened by habitat loss and climate change. However, the study does not identify specific adaptive loci or functional variants, limiting its immediate applicability to understanding the molecular basis of traits crucial for survival in extreme environments. Expanding the functional annotation of this genome would significantly enhance its utility in conservation and ecological genomics. Moreover, the high repetitive element content (42.35%) underscores the need for detailed annotation to facilitate downstream studies. These issues suggest that additional refinement and validation are warranted. Despite these limitations, the assembly is invaluable for studying genetic adaptations, hybridization, and hare conservation. Future research should focus on functional annotation, population-level comparisons, and targeted studies of ecological traits to fully realize the potential of this high-quality reference genome.

             

References

Fekete Z, Absolon DE, Michell C, Wood JMD, Goffart S, Pohjoismäki JLO (2024) Chromosome-level reference genome assembly for the mountain hare (Lepus timidus). bioRxiv, ver. 2 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2024.06.10.598177

Mc Cartney AM, Formenti G, Mouton A, De Panis D, Marins LS, Leitão HG, Diedericks G, Kirangwa J, Morselli M, Salces-Ortiz J, Escudero N, Iannucci A, Natali C, Svardal H, Fernández R, De Pooter T, Joris G, Strazisar M, Wood JMD, Herron KE, …, Mazzoni CJ (2024) The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics. npj Biodiversity, 3, 28. https://doi.org/10.1038/s44185-024-00054-6

 

Chromosome-level reference genome assembly for the mountain hare (*Lepus timidus*)Zsofia Fekete, Dominic E. Absolon, Craig Michell, Jonathan M. D. Wood, Steffi Goffart, Jaakko L. O. Pohjoismaki<p>&nbsp;We present here a high-quality genome assembly of a male mountain hare (<em>Lepus timidus</em> Linnaeus), from Ilomantsi, Eastern Finland, utilizing an isolated fibroblast cell line as the source for high quality DNA and RNA. Following th...Bioinformatics, ERGA Pilot, Evolutionary genomics, VertebratesJitendra Narayan2024-06-11 08:52:32 View
22 May 2025
article picture
POSTPRINT

The genome sequence of the Violet Carpenter Bee, Xylocopa violacea (Linnaeus, 1785): a hymenopteran species undergoing range expansion

A high-quality genome assembly for carpenter bees

Recommended by ORCID_LOGO and ORCID_LOGO

Christian Lopezguerra (1) and Gavin M. Douglas (2, 3)

(1) Department of Plant and Microbial Biology; (2) Department of Biological Sciences; (3) Bioinformatics Research Center, North Carolina State University, USA

Climate change and anthropogenic stressors are driving rapid biodiversity loss and dynamic shifts in species ranges (Outhwaite et al. 2022). Partially in response to the decline in biodiversity, the European Reference Genome Atlas (ERGA) has been generating high-quality accessible genome resources, allowing for a more collaborative network and assisting conservation efforts.

One recent target was the violet carpenter bee (Xylocopa violacea; Figure 1), one of the many insects that have shown a recent expansion in their range within Europe. This species is a key pollinator and is therefore of great interest for ecological and agricultural purposes. In addition, anticancer research with melittin variants present in the venom of the violet carpenter bee shows potential (Erkoc et al. 2022; von Reumont et al. 2022). However, genetic analyses have been limited by the prior contig-level assembly of the genome (Koludarov et al. 2023). Developing a high-quality, annotated reference genome for the carpenter bee was the goal of Nash and colleagues’ (2024) research, as part of the European Reference Genome Atlas initiative.

Violet carpenter bee (in Margarida, Spain). Copyright Susanne Vogel, a photographer who made this available on iNaturalist.com. Distributed under a CC-BY 4.0 license.

Figure 1: Violet carpenter bee (in Margarida, Spain). Copyright Susanne Vogel, a photographer who made this available on iNaturalist.com. Distributed under a CC-BY 4.0 license.

The authors coupled long and short-read sequencing techniques to create an improved assembly. In particular, they used both short-read RNA-seq and long-read Iso-Seq for gene annotation. They also used Hi-C sequencing to capture chromosome conformational information to aid scaffolding. Their final assembly contains 1,300 scaffolds and has a BUSCO completeness level of 99.75% (Manni et al. 2021), aligning with the standards of the European Reference Genome Atlas. The authors generated a 1.02 gigabase assembly, which was much larger than the expected size of 672 megabases based on k-mer content. The authors partially explain the difference by the high repeat content in the genome, particularly specific 109-mer and 217-mer repeats. Due to this high repeat content, the authors could not assemble full chromosomes but instead produced 17 pseudo-chromosomes comprised of 481.4 megabases (in addition to all other unlocalized scaffolds). 

This high-quality reference genome will be valuable for future studies on population and functional genomics of carpenter bees (Xylocopa). Indeed, this is the first high-quality annotated pseudo-chromosomal genome assembly of the genus Xylocopa, which includes hundreds of other species. It will enable improved investigation into genomic signatures associated with shifting populations. More generally, this reference genome will be useful for comparative analyses with other Hymenoptera species.

             

References

Erkoc P, von Reumont BM, Lüddecke T, Henke M, Ulshöfer T, Vilcinskas A, Fürst R, Schiffmann S (2022) The pharmacological potential of novel melittin variants from the honeybee and solitary bees against inflammation and cancer. Toxins, 14, 818. https://doi.org/10.3390/toxins14120818

Koludarov I, Velasque M, Senoner T, Timm T, Greve C, Hamadou AB, Gupta DK, Lochnit G, Heinzinger M, Vilcinskas A, Gloag R, Harpur BA, Podsiadlowski L, Rost B, Jackson TNW, Dutertre S, Stolle E, von Reumont BM (2023) Prevalent bee venom genes evolved before the aculeate stinger and eusociality. BMC Biology, 21, 229. https://doi.org/10.1186/s12915-023-01656-5

Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM (2021) BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular Biology and Evolution, 38, 4647–4654. https://doi.org/10.1093/molbev/msab199

Nash WJ, Man A, McTaggart S, Baker K, Barker T, Catchpole L, Durrant A, Gharbi K, Irish N, Kaithakottil G, Ku D, Providence A, Shaw F, Swarbreck D, Watkins C, McCartney AM, Formenti G, Mouton A, Vella N, von Reumont BM, Vella A, Haerty W (2024) The genome sequence of the Violet Carpenter Bee, Xylocopa violacea (Linnaeus, 1785): a hymenopteran species undergoing range expansion. Heredity, 133, 381–387. https://doi.org/10.1038/s41437-024-00720-2

Outhwaite CL, McCann P, Newbold T (2022) Agriculture and climate change are reshaping insect biodiversity worldwide. Nature, 605, 97–102. https://doi.org/10.1038/s41586-022-04644-x

von Reumont BM, Dutertre S, Koludarov I (2022) Venom profile of the European carpenter bee Xylocopa violacea: Evolutionary and applied considerations on its toxin components. Toxicon: X, 14, 100117. https://doi.org/10.1016/j.toxcx.2022.100117

 

The genome sequence of the Violet Carpenter Bee, *Xylocopa violacea* (Linnaeus, 1785): a hymenopteran species undergoing range expansionWill J. Nash, Angela Man, Seanna McTaggart, Kendall Baker, Tom Barker, Leah Catchpole, Alex Durrant, Karim Gharbi, Naomi Irish, Gemy Kaithakottil, Debby Ku, Aaliyah Providence, Felix Shaw, David Swarbreck, Chris Watkins, Ann M. McCartney, Giulio F...<p style="text-align: justify;">We present a reference genome assembly from an individual male Violet Carpenter Bee (<em>Xylocopa violacea</em>, Linnaeus 1758). The assembly is 1.02 gigabases in span. 48% of the assembly is scaffolded into 17 pseu...Arthropods, ERGA, ERGA PilotChristian Lopezguerra2025-05-16 23:21:44 View
14 May 2025
article picture

Genomic changes are varied across congeneric species pairs of animals

Exploring the correlation between speciation and genome rearrangements

Recommended by ORCID_LOGO based on reviews by Jean-Baptiste Ledoux and 3 anonymous reviewers

Francis et al. (2025) investigate the relationship between genomic rearrangement, specifically macro- and micro-synteny, and speciation across a broad range of animal phyla. Using chromosome-level genome assemblies, they generated 1:1 ortholog pairs and analyzed synteny conservation using custom bioinformatics pipelines to quantify microsynteny. The study is well written, methodologically sound, and offers valuable insights beyond comparative genomics.

The authors show that while most congeneric species pairs exhibit disruptions in micro-synteny, they retain high levels of protein sequence identity. They also find that macro- and micro-synteny decay with speciation but are often decoupled, indicating no universal genomic trajectory during divergence. Their conclusion, that synteny patterns alone are insufficient to define species boundaries (Steenwyk and King 2024), is well supported by their data.

The discussion effectively situates the work within the broader context of speciation research. It thoughtfully addresses study limitations, such as challenges in synteny block quantification, chromosomal rearrangement rates, and the scarcity of high-quality genome assemblies. The manuscript also outlines clear directions for future research, including the need for more accurate divergence time estimates and expanded taxonomic sampling (Formenti et al. 2022).

              
References

Formenti G, Theissinger K, Fernandes C, Bista I, Bombarely A, Bleidorn C, et al. (2022) The era of reference genomes in conservation genomics. Trends in Ecology & Evolution, 37, 197–202. https://doi.org/10.1016/j.tree.2021.11.008

Francis WR, Vargas S, Wörheide G (2025) Genomic changes are varied across congeneric species pairs of animals. bioRxiv, ver. 4 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2024.09.05.611358

Steenwyk JL, King N (2024) The promise and pitfalls of synteny in phylogenomics. PLOS Biology, 22, e3002632. https://doi.org/10.1371/journal.pbio.3002632

 

Genomic changes are varied across congeneric species pairs of animalsWarren R. Francis, Sergio Vargas, Gert Wörheide<p>Synteny, the shared arrangement of genes on chromosomes between related species, is a marker of shared ancestry, and synteny-breaking events can result in genomic incompatibilities between populations and ultimately lead to speciation events. D...Evolutionary genomicsJavier del CampoAnonymous, Nicolas Shogo Locatelli, Jean-Baptiste Ledoux, Anonymous2024-09-06 17:57:07 View
13 Mar 2025
article picture

Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping bias

A novel genotype likelihood-based method to reduce mapping bias in low-coverage and ancient DNA studies

Recommended by ORCID_LOGO based on reviews by Maxime Lefebvre, Michael Westbury and Adrien Oliva

The study of genomic variability within and between populations, as well as among species, relies on comparative analyses of homologous positions—sites that share a common evolutionary origin. Homology is inferred through sequence similarity (Reeck et al. 1987). However, the ability to detect homologous regions can be compromised when sequence mismatches accumulate due to mutations, especially when analyzing short DNA fragments, as in short-read sequencing (Li et al. 2008). In the genomic era, accurately mapping homologous DNA fragments to a reference genome is essential for obtaining precise estimates of genetic variability and evolutionary inferences (e.g., Li et al. 2008; Ellegren 2014). However, short-read, high-throughput sequencing often introduces mapping bias, disproportionately favoring the reference allele. This bias distorts allele frequency estimates, ancestry proportions, and genotype likelihoods, impacting downstream analyses (e.g., Günther & Nettelblad 2019; Martiniano et al. 2020).

Mapping bias is particularly problematic in ancient DNA studies, where post-mortem damage exacerbates sequencing errors. DNA fragmentation limits read length, while deamination, causing G to A and C to U transitions, increases mismatches and further complicates homology identification (Dabney & Pääbo 2013). These degradation processes contribute to the misidentification of true variants, confounding evolutionary inferences. Various strategies have been developed to mitigate mapping bias, including the commonly used approach, called pseudo-haploid data, that randomly picks a single read at each analyzed position for each  individual, thereby retaining a single allele at each polymorphic site (Günther & Nettelblad 2019; Barlow et al. 2020). 

Günther et al. (2025) introduce a novel method to correct mapping bias using a genotype likelihood-based approach, incorporating a mapping bias ratio to adjust for reference allele overrepresentation. The method specifically targets known single nucleotide polymorphisms (SNPs) because in population genomic analysis of ancient DNA data, low coverage and post-mortem damage often hinder the ability to identify novel SNPs in most individuals. The analysis focuses on DNA fragmentation, assuming that deamination effects are minimal when considering ascertained SNPs. The proposed method was compared against existing approaches, including pseudo-haploid data and standard genotype likelihood-based probabilistic methods. The evaluation was performed using both empirical and simulated data. For empirical data, low-coverage sequencing data from the 1000 Genomes Project (Finnish in Finland, Japanese in Tokyo, Yoruba in Ibadan, Nigeria populations) was analyzed, while for simulated data, ancient DNA-like datasets were generated using ms-prime (Kelleher et al. 2016), modeling different sequencing depths, divergence times, and reference genome choices. The study assesses the impact of mapping bias on the ratio of reference versus non-reference allele mapping, the accuracy of SNP allele frequency estimates relative to true frequencies, the deviation and variance between estimated and true allele frequencies, population differentiation and the estimation of admixture proportions using supervised and unsupervised methods, considering both genotype likelihoods and genotype calls.

Günther et al. (2025) bring to light that all methods analyzed exhibit minor but systematic reference allele bias. The new corrected genotype likelihood method outperforms the standard genotype likelihood approach in correlating with true allele frequencies, although the pseudo-haploid method still provides the most accurate estimates. Mapping bias also affects ancestry estimation, leading to admixture proportion errors of up to 4%, though this effect is smaller than the 10% discrepancy observed across different inference methods.

The work performed by Günther et al. (2025) provides a rigorous and innovative evaluation of mapping bias in the context of ascertained SNPs, introducing a probabilistic approach that improves bias correction. Unlike non-probabilistic methods such as pseudo-haploid data, the genotype likelihood framework leverages all sequencing reads for each analyzed SNP, and can incorporate additional bias corrections, enhancing its applicability across different sequencing conditions. While probabilistic approaches offer clear advantages in bias correction, they can be less intuitive to interpret compared to traditional genotype calling methods. This study highlights that mapping bias is pervasive across all methods, influencing evolutionary inferences such as selection signals and population differentiation. Although the improvements in allele frequency recovery may seem modest, the genome-wide impact of mapping bias is significant, especially in ancient DNA studies, making bias correction essential for robust evolutionary analyses.

                      

References
 
Barlow A, Hartmann S, Gonzalez J, Hofreiter M, Paijmans JLA. (2020) Consensify: A method for generating pseudohaploid genome sequences from palaeogenomic datasets with reduced error rates. Genes;11(1):50. https://doi.org/10.3390/genes11010050 
 
Dabney J, Meyer M, Pääbo S. (2013) Ancient DNA damage. Cold Spring Harb Perspect Biol. 5(7):a012567. https://doi.org/10.1101/cshperspect.a012567 

Ellegren H. (2014) Genome sequencing and population genomics in non-model organisms. Trends Ecol Evol. 29(1):51-63. https://doi.org/10.1016/j.tree.2013.09.008 

Günther T, Nettelblad C. (2019) The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet.15(7):e1008302. https://doi.org/10.1371/journal.pgen.1008302 

Günther T., Goldberg A., Schraiber J. G.  (2025) Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping bias. bioRxiv, ver. 5 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2024.07.01.601500 

Kelleher J., Etheridge A. M., McVean G. (2016) Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS computational biology, 12(5):e1004842. https://doi.org/10.1371/journal.pcbi.1004842

Li H, Ruan J, Durbin R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11):1851-8. https://doi.org/10.1101/gr.078212.108 

Reeck GR, de Haën C, Teller DC, Doolittle RF, Fitch WM, Dickerson RE, et al. (1987) "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it. Cell. 50 (5): 667. https://doi.org/10.1016/0092-8674(87)90322-9 

Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping biasTorsten Günther, Amy Goldberg, Joshua G. Schraiber<p>Population genomic analyses rely on an accurate and unbiased characterization of the genetic composition of the studied population. For short-read, high-throughput sequencing data, mapping sequencing reads to a linear reference genome can bias ...Bioinformatics, Evolutionary genomics, Population genomicsSebastian Ernesto Ramos-Onsins2024-07-02 10:46:19 View
11 Mar 2021
article picture

Gut microbial ecology of Xenopus tadpoles across life stages

A comprehensive look at Xenopus gut microbiota: effects of feed, developmental stages and parental transmission

Recommended by based on reviews by Vanessa Marcelino and 1 anonymous reviewer

It is well established that the gut microbiota play an important role in the overall health of their hosts (Jandhyala et al. 2015). To date, there are still a limited number of studies on the complex microbial communites inhabiting vertebrate digestive systems, especially the ones that also explored the functional diversity of the microbial community (Bletz et al. 2016).

This preprint by Scalvenzi et al. (2021) reports a comprehensive study on the phylogenetic and metabolic profiles of the Xenopus gut microbiota. The author describes significant changes in the gut microbiome communities at different developmental stages and demonstrates different microbial community composition across organs. In addition, the study also investigates the impact of diet on the Xenopus tadpole gut microbiome communities as well as how the bacterial communities are transmitted from parents to the next generation.

This is one of the first studies that addresses the interactions between gut bacteria and tadpoles during the development. The authors observe the dynamics of gut microbiome communities during tadpole growth and metamorphosis. They also explore host-gut microbial community metabolic interactions and demostrate the capacity of the microbiome to complement the metabolic pathways of the Xenopus genome. Although this study is limited by the use of Xenopus tadpoles in a laboratory, which are probably different from those in nature, I believe it still provides important and valuable information for the research community working on vertebrate’s microbiota and their interaction with the host. 

References

Bletz et al. (2016). Amphibian gut microbiota shifts differentially in community structure but converges on habitat-specific predicted functions. Nature Communications, 7(1), 1-12. doi: https://doi.org/10.1038/ncomms13699

Jandhyala, S. M., Talukdar, R., Subramanyam, C., Vuyyuru, H., Sasikala, M., & Reddy, D. N. (2015). Role of the normal gut microbiota. World journal of gastroenterology: WJG, 21(29), 8787. doi: https://dx.doi.org/10.3748%2Fwjg.v21.i29.8787

Scalvenzi, T., Clavereau, I., Bourge, M. & Pollet, N. (2021) Gut microbial ecology of Xenopus tadpoles across life stages. bioRxiv, 2020.05.25.110734, ver. 4 peer-reviewed and recommended by Peer community in Geonmics. https://doi.org/10.1101/2020.05.25.110734

Gut microbial ecology of Xenopus tadpoles across life stagesThibault Scalvenzi, Isabelle Clavereau, Mickael Bourge, Nicolas Pollet<p><strong>Background</strong> The microorganism world living in amphibians is still largely under-represented and under-studied in the literature. Among anuran amphibians, African clawed frogs of the Xenopus genus stand as well-characterized mode...Evolutionary genomics, Metagenomics, VertebratesWirulda Pootakham2020-05-25 14:01:19 View
03 Sep 2024
article picture

A chromosome-level, haplotype-resolved genome assembly and annotation for the Eurasian minnow (Leuciscidae: Phoxinus phoxinus) provide evidence of haplotype diversity

Exploring evolutionary adaptations through Phoxinus phoxinus genomics

Recommended by ORCID_LOGO based on reviews by Alice Dennis and 2 anonymous reviewers

Oriowo et al. (2024) offer a thorough and meticulously conducted study that makes a substantial contribution to our understanding of the Eurasian minnow (Phoxinus phoxinus), particularly in terms of its genetic diversity, structural variations, and evolutionary adaptations. The authors have achieved an impressive feat by generating an annotated haplotype-phased, chromosome-level genome assembly (2n = 50). This was accomplished through the integration of high-fidelity long reads with chromosome conformation capture data (Hi-C), resulting in a highly complete and accurate genome assembly. The assembly is characterized by a haploid size of 940 Megabase pairs (Mbp) for haplome one and 929 Mbp for haplome two, with scaffold N50 values of 36.4 Mb and 36.6 Mb, respectively. These metrics, alongside BUSCO scores of 96.9% and 97.2%, highlight the high quality of the genome, making it a robust foundation for further genetic exploration and analyses.

The study’s findings are both novel and significant, providing deep insights into the genetic architecture of P. phoxinus. The authors report heterozygosity rate of 1.43% and a high repeat content of approximately 54%, primarily consisting of DNA transposons. These transposons play a crucial role in genome rearrangements and variations, contributing to the species' adaptability and evolution (Bourque et al. 2018). The research also identifies substantial structural variations within the genome, including insertions, deletions, inversions, and translocations (Oriowo et al. 2024). Beyond these findings, the genome annotation is exceptionally comprehensive, containing 30,980 mRNAs and 23,497 protein-coding genes. The study’s gene family evolution analysis, which compares the P. phoxinus proteome to that of ten other teleost species, reveals immune system gene families that favor histone-based disease prevention mechanisms over NLR-based immune responses. This provides new insight into the evolutionary strategies that have emerged in P. phoxinus, enabling its survival in its environment. Moreover, the demographic analysis conducted in the study reveals historical fluctuations in the effective population size of P. phoxinus, likely correlated with past climatic changes, offering insights into the species' evolutionary history.

This annotated and phased reference genome not only serves as a crucial resource for resolving taxonomic complexities within the genus Phoxinus but also highlights the importance of haplotype-phased assemblies in understanding genetic diversity, particularly in species characterized by high heterozygosity. The authors have delivered a study that is methodologically sound, richly detailed, and highly relevant to the field. The study represents a valuable and impactful contribution to the scientific community, offering resources and knowledge that will likely inform future research in the field.

              

References

Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C (2018) Ten things you should know about transposable elements. Genome Biology, 19, 199. https://doi.org/10.1186/s13059-018-1577-z

Oriowo TO, Chrysostomakis I, Martin S, Kukowka S, Brown T, Winkler S, Myers EW, Böhne A, Stange M (2024) A chromosome-level, haplotype-resolved genome assembly and annotation for the Eurasian minnow (Leuciscidae: Phoxinus phoxinus) provide evidence of haplotype diversity. bioRxiv, ver. 6 peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2023.11.30.569369

A chromosome-level, haplotype-resolved genome assembly and annotation for the Eurasian minnow (Leuciscidae: *Phoxinus phoxinus*) provide evidence of haplotype diversityTemitope O. Oriowo, Ioannis Chrysostomakis, Sebastian Martin, Sandra Kukowka, Thomas Brown, Sylke Winkler, Eugene W. Myers, Astrid Boehne, Madlen Stange<p>In this study we present an in-depth analysis of the Eurasian minnow (<em>Phoxinus phoxinus</em>) genome, highlighting its genetic diversity, structural variations, and evolutionary adaptations. We generated an annotated haplotype-phased, chrom...Evolutionary genomics, Structural genomics, VertebratesJitendra Narayan Henrik Lanz, Rui Borges, Fergal Martin, Vinod Scaria, Mihai Pop, Alice Dennis, Jin-Wu Nam, Monya Baker, Giuseppe Narzisi2023-12-04 14:49:17 View
08 Nov 2022
article picture

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks

How to best call the somatic mosaic tree?

Recommended by based on reviews by 2 anonymous reviewers

Any multicellular organism is a molecular mosaic with some somatic mutations accumulated between cell lineages. Big long-lived trees have nourished this imaginary of a somatic mosaic tree, from the observation of spectacular phenotypic mosaics and also because somatic mutations are expected to potentially be passed on to gametes in plants (review in Schoen and Schultz 2019). The lower cost of genome sequencing now offers the opportunity to tackle the issue and identify somatic mutations in trees.

However, when it comes to characterizing this somatic mosaic from genome sequences, things become much more difficult than one would think in the first place. What separates cell lineages ontogenetically, in cell division number, or in time? How to sample clonal cell populations? How do somatic mutations distribute in a population of cells in an organ or an organ sample? Should they be fixed heterozygotes in the sample of cells sequenced or be polymorphic? Do we indeed expect somatic mutations to be fixed? How should we identify and count somatic mutations?

To date, the detection of somatic mutations has mostly been done with a single variant caller in a given study, and we have little perspective on how different callers provide similar or different results. Some studies have used standard SNP callers that assumed a somatic mutation is fixed at the heterozygous state in the sample of cells, with an expected allele coverage ratio of 0.5, and less have used cancer callers, designed to detect mutations in a fraction of the cells in the sample. However, standard SNP callers detect mutations that deviate from a balanced allelic coverage, and different cancer callers can have different characteristics that should affect their outcomes.

In order to tackle these issues, Schmitt et al. (2022) conducted an extensive simulation analysis to compare different variant callers. Then, they reanalyzed two large published datasets on pedunculate oak, Quercus robur.  The analysis of in silico somatic mutations allowed the authors to evaluate the performance of different variant callers as a function of the allelic fraction of somatic mutations and the sequencing depth. They found one of the seven callers to provide better and more robust calls for a broad set of allelic fractions and sequencing depths. The reanalysis of published datasets in oaks with the most effective cancer caller of the in silico analysis allowed them to identify numerous low-frequency mutations that were missed in the original studies.

I recommend the study of Schmitt et al. (2022) first because it shows the benefit of using cancer callers in the study of somatic mutations, whatever the allelic fraction you are interested in at the end. You can select fixed heterozygotes if this is your ultimate target, but cancer callers allow you to have in addition a valuable overview of the allelic fractions of somatic mutations in your sample, and most do as well as SNP callers for fixed heterozygous mutations. In addition, Schmitt et al. (2022) provide the pipelines that allow investigating in silico data that should correspond to a given study design, encouraging to compare different variant callers rather than arbitrarily going with only one. We can anticipate that the study of somatic mutations in non-model species will increasingly attract attention now that multiple tissues of the same individual can be sequenced at low cost, and the study of Schmitt et al. (2022) paves the way for questioning and choosing the best variant caller for the question one wants to address.

References

Schoen DJ, Schultz ST (2019) Somatic Mutation and Evolution in Plants. Annual Review of Ecology, Evolution, and Systematics, 50, 49–73. https://doi.org/10.1146/annurev-ecolsys-110218-024955

Schmitt S, Leroy T, Heuertz M, Tysklind N (2022) Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks. bioRxiv, 2021.10.11.462798. ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2021.10.11.462798

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaksSylvain Schmitt, Thibault Leroy, Myriam Heuertz, Niklas Tysklind<p style="text-align: justify;">1. Mutation, the source of genetic diversity, is the raw material of evolution; however, the mutation process remains understudied, especially in plants. Using both a simulation and reanalysis framework, we set out ...Bioinformatics, PlantsNicolas BierneAnonymous, Anonymous2022-04-28 13:24:19 View
06 Feb 2024
article picture

The need of decoding life for taking care of biodiversity and the sustainable use of nature in the Anthropocene - a Faroese perspective

Why sequence everything? A raison d’être for the Genome Atlas of Faroese Ecology

Recommended by ORCID_LOGO based on reviews by Tereza Manousaki and 1 anonymous reviewer

When discussing the Earth BioGenome Project with scientists and potential funding agencies, one common question is: why sequence everything? Whether sequencing a subset would be more optimal is not an unreasonable question given what we know about the mathematics of importance and Pareto’s 80:20 principle, that 80% of the benefits can come from 20% of the effort. However, one must remember that this principle is an observation made in hindsight and selecting the most effective 20% of experiments is difficult. As an example, few saw great applied value in comparative genomic analysis of the archaea Haloferax mediterranei, but this enabled the discovery of CRISPR/Cas9 technology (1). When discussing whether or not to sequence all life on our planet, smaller countries such as the Faroe Islands are seldom mentioned. 
 
Mikalsen and co-authors (2) provide strong arguments to appreciate, investigate and steward genetic diversity, from a Faroese viewpoint, a fishery viewpoint, and a global viewpoint. As readers, we learn to cherish the Faroe Islands, the Faroese, and perhaps by extension all of nature and the people of the world. The manuscript describes the proposed Faroese participation in the European Reference Genome Atlas (ERGA) consortium through Gen@FarE – the Genome Atlas of Faroese Ecology. Gen@FarE aims to: i) generate high-quality reference genomes for all eukaryotes on the islands and in its waters; ii) establish population genetics of all species of commercial or ecological interest; and iii) establish a “databank” for all Faroese species with citizen science tools for participation.


In the background section of the manuscript, the authors argue that as caretakers of the earth (and responsible for the current rapid decrease in biodiversity), humanity must be aware of the biodiversity and existing genetic diversity, to protect these for future generations. Thus, it is necessary to have reference genomes for as many species as possible, enabling estimation of population sizes and gene flow between ecosystem locations. Without this the authors note that “…it is impossible to make relevant management plans for a species, an ecosystem or a geographical area…”. Gen@FarE is important. The Faroe nation has a sizable economic zone in the North Atlantic and large fisheries. In terms of biodiversity and conservation, the authors list some species endemic to other Faroe islands, especially sea birds. The article discusses ongoing marine environmental-DNA-based monitoring programs that started in 2018, and how new reference genome databases will help these efforts to track and preserve marine biodiversity. They point to the lack of use of population genomics information for Red List decisions on which species are endangered, and the need for these techniques to inform sustainable harvesting of fisheries, given collapses in critical food species such as Northwest Atlantic cod and herring. In one example, they highlight how the herring chromosome 12 inversion contains a “supergene” collection of tightly linked genes associated with ecological adaptation. Genetic tools may also help enable the identification and nurturing of feeding grounds for young individuals. Critically, the Faroe Islands have a significant role to play in protecting the millions of tons of seafood caught annually upon which humanity relies. As the authors note, population genomics based on high-quality reference sequences is “likely the best tool” to monitor and protect commercial fisheries. There is an important section discussing the role of interactions between visible and “invisible" species in the marine ecosystem on which we all depend. Examples of “invisible” species include a wide range of morphologically similar planktonic algae, and invasive species transported by ballast water or ship hulls.​ As biologists, I believe we forget that our population studies of life on the earth have so far been mostly in the dark. Gen@FarE is but one light that can be switched on. 


The authors conclude by discussing Gen@FarE plans for citizen science and education, perhaps the most important part of this project if humanity is to learn to cherish and care for the earth. Where initiatives such as the Human Genome Project did not need the collaborative efforts of the world for sample access, the Earth BioGenome Project most certainly does. In the same way, at a smaller scale, Gen@FarE requires the support and determination of the Faroese. 
 


References    

1          Mojica, F. J., Díez-Villaseñor, C. S., García-Martínez, J. & Soria, E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60, 174-182 (2005).

2          Mikalsen, S-O., Hjøllum, J. í., Salter, I., Djurhuus, A. & Kongsstovu, S. í. The need of decoding life for taking care of biodiversity and the sustainable use of nature in the Anthropocene – a Faroese perspective. EcoEvoRxiv (2024), ver. 3 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.32942/X21S4C

The need of decoding life for taking care of biodiversity and the sustainable use of nature in the Anthropocene - a Faroese perspectiveSvein-Ole Mikalsen, Jari í Hjøllum, Ian Salter, Anni Djurhuus, Sunnvør í Kongsstovu<p>Biodiversity is under pressure, mainly due to human activities and climate change. At the international policy level, it is now recognised that genetic diversity is an important part of biodiversity. The availability of high-quality reference g...ERGA, ERGA Pilot, Population genomics, VertebratesStephen Richards2023-07-31 16:59:33 View
23 Sep 2022
article picture

MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies

MATEdb: a new phylogenomic-driven database for Metazoa

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

The development (and standardization) of high-throughput sequencing techniques has revolutionized evolutionary biology, to the point that we almost see as normal fine-detail studies of genome architecture evolution (Robert et al., 2022), adaptation to new habitats (Rahi et al., 2019), or the development of key evolutionary novelties (Hilgers et al., 2018), to name three examples. One of the fields that has benefited the most is phylogenomics, i.e. the use of genome-wide data for inferring the evolutionary relationships among organisms. Dealing with such amount of data, however, has come with important analytical and computational challenges. Likewise, although the steady generation of genomic data from virtually any organism opens exciting opportunities for comparative analyses, it also creates a sort of “information fog”, where it is hard to find the most appropriate and/or the higher quality data. I have personally experienced this not so long ago, when I had to spend several weeks selecting the most complete transcriptomes from several phyla, moving back and forth between the NCBI SRA repository and the relevant literature.

In an attempt to deal with this issue, some research labs have committed their time and resources to the generation of taxa- and topic-specific databases (Lathe et al., 2008), such as MolluscDB (Liu et al., 2021), focused on mollusk genomics, or EukProt (Richter et al., 2022), a protein repository representing the diversity of eukaryotes. A new database that promises to become an important resource in the near future is MATEdb (Fernández et al., 2022), a repository of high-quality genomic data from Metazoa. MATEdb has been developed from publicly available and newly generated transcriptomes and genomes, prioritizing quality over quantity. Upon download, the user has access to both raw data and the related datasets: assemblies, several quality metrics, the set of inferred protein-coding genes, and their annotation. Although it is clear to me that this repository has been created with phylogenomic analyses in mind, I see how it could be generalized to other related problems such as analyses of gene content or evolution of specific gene families. In my opinion, the main strengths of MATEdb are threefold:

  1. Rosa Fernández and her team have carefully scrutinized the genomic data available in several repositories to retrieve only the most complete transcriptomes and genomes, saving a lot of time in data mining to the user.
  2. These data have been analyzed to provide both the assembly and the set of protein-coding genes, easing the computational burden that usually accompanies these pipelines. Interestingly, all the data have been analyzed with the same software and parameters, facilitating comparisons among taxa.
  3. Genomic analysis can be intimidating, and even more for inexperienced users. That is particularly important when it comes to transcriptome and genome assembly because it has an effect in all downstream analyses. I believe that having access to already analyzed data softens this transition. The users can move forward on their research while they learn how to generate and analyze their data at their own pace.

On a negative note, I see two main drawbacks. First, as of today (September 16th, 2022) this database is in an early stage and it still needs to incorporate a lot of animal groups. This has been discussed during the revision process and the authors are already working on it, so it is only a matter of time until all major taxa are represented. Second, there is a scalability issue. In its current format it is not possible to select the taxa of interest and the full database has to be downloaded, which will become more and more difficult as it grows. Nonetheless, with the appropriate resources it would be easy to find a better solution. There are plenty of examples that could serve as inspiration, so I hope this does not become a big problem in the future.

Altogether, I and the researchers that participated in the revision process believe that MATEdb has the potential to become an important and valuable addition to the metazoan phylogenomics community. Personally, I wish it was available just a few months ago, it would have saved me so much time.

References

Fernández R, Tonzo V, Guerrero CS, Lozano-Fernandez J, Martínez-Redondo GI, Balart-García P, Aristide L, Eleftheriadi K, Vargas-Chávez C (2022) MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies. bioRxiv, 2022.07.18.500182, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.07.18.500182

Hilgers L, Hartmann S, Hofreiter M, von Rintelen T (2018) Novel Genes, Ancient Genes, and Gene Co-Option Contributed to the Genetic Basis of the Radula, a Molluscan Innovation. Molecular Biology and Evolution, 35, 1638–1652. https://doi.org/10.1093/molbev/msy052

Lathe W, Williams J, Mangan M, Karolchik, D (2008). Genomic data resources: challenges and promises. Nature Education, 1(3), 2.

Liu F, Li Y, Yu H, Zhang L, Hu J, Bao Z, Wang S (2021) MolluscDB: an integrated functional and evolutionary genomics database for the hyper-diverse animal phylum Mollusca. Nucleic Acids Research, 49, D988–D997. https://doi.org/10.1093/nar/gkaa918

Rahi ML, Mather PB, Ezaz T, Hurwood DA (2019) The Molecular Basis of Freshwater Adaptation in Prawns: Insights from Comparative Transcriptomics of Three Macrobrachium Species. Genome Biology and Evolution, 11, 1002–1018. https://doi.org/10.1093/gbe/evz045

Richter DJ, Berney C, Strassert JFH, Poh Y-P, Herman EK, Muñoz-Gómez SA, Wideman JG, Burki F, Vargas C de (2022) EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes. bioRxiv, 2020.06.30.180687, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2020.06.30.180687

Robert NSM, Sarigol F, Zimmermann B, Meyer A, Voolstra CR, Simakov O (2022) Emergence of distinct syntenic density regimes is associated with early metazoan genomic transitions. BMC Genomics, 23, 143. https://doi.org/10.1186/s12864-022-08304-2

MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studiesRosa Fernandez, Vanina Tonzo, Carolina Simon Guerrero, Jesus Lozano-Fernandez, Gemma I Martinez-Redondo, Pau Balart-Garcia, Leandro Aristide, Klara Eleftheriadi, Carlos Vargas-Chavez<p style="text-align: justify;">With the advent of high throughput sequencing, the amount of genomic data available for animals (Metazoa) species has bloomed over the last decade, especially from transcriptomes due to lower sequencing costs and ea...Bioinformatics, Evolutionary genomics, Functional genomicsSamuel Abalde2022-07-20 07:30:39 View
06 Aug 2024
article picture

Identification and quantification of transposable element transcripts using Long-Read RNA-seq in Drosophila germline tissues

Unveiling transposon dynamics: Advancing TE expression analysis in Drosophila with long-read sequencing

Recommended by based on reviews by Silke Jensen, Christophe Antoniewski and 1 anonymous reviewer

Transposable elements (TEs) are mobile genetic elements with an intrinsic mutagenic potential that influences the physiology of any cell type, whether somatic or germinal. Measuring TE expression is a fundamental prerequisite for analysing the processes leading to the activity of TE-derived sequences. This applies to both old and recent TEs, as even if they are deficient in mobilisation, transcription of TE sequences alone can impact neighbouring gene expression and other cellular activities.

In terms of TE physiology, transcription is crucial for mobilisation activity. The transcription of some TEs can be tissue-specific and associated with splicing events, as exemplified by the P-element isoforms in the fruit fly (Laski et al. 1986). Regarding host cell physiology, TE transcripts can include nearby exons, with or without splicing, and such chimeric transcripts can significantly alter gene activity. Thus, quantitative and qualitative analyses must be conducted to assess TE function and how they can modify genomic activities. Yet, due to the polymorphic, interspersed, and repetitive nature of TE sequences, the quantitative and qualitative analysis of TE transcript levels using short-read sequencing remains challenging (Lanciano and Cristofari 2020).

In this context, Rebollo et al. (2024) employed nanopore long-read sequencing to analyse cDNAs derived from Drosophila melanogaster germline RNAs. The authors constructed two long-read cDNA libraries from pooled ovaries and testes using a protocol to obtain full-length cDNAs and sequenced them separately. They carefully compared their results with their short-read datasets. Overall, their observations corroborate known patterns of germline-specific expression of certain TEs and provide initial evidence of novel spliced TE transcript isoforms in Drosophila.

Rebollo and colleagues have provided a well-documented and detailed analysis of their results, which will undoubtedly benefit the scientific community. They presented the challenges and limitations of their approach, such as the length of the transcripts, and provided a reproducible analysis workflow that will enable better characterisation of TE expression using long-read technology.

Despite the small number of samples and limited sequencing depth, this pioneering study strikingly demonstrates the potential of long-read sequencing for the quantitative and qualitative analysis of TE transcription, a technology that will facilitate a better understanding of the transposon landscape.

              
References

Lanciano S, Cristofari G (2020) Measuring and interpreting transposable element expression. Nature Reviews Genetics, 21, 721–736. https://doi.org/10.1038/s41576-020-0251-y

Laski FA, Rio DC, Rubin GM (1986) Tissue specificity of Drosophila P element transposition is regulated at the level of mRNA splicing. Cell, 44, 7–19. https://doi.org/10.1016/0092-8674(86)90480-0

Rebollo R, Gerenton P, Cumunel E, Mary A, Sabot F, Burlet N, Gillet B, Hughes S, Oliveira DS, Goubert C, Fablet M, Vieira C, Lacroix V (2024) Identification and quantification of transposable element transcripts using Long-Read RNA-seq in Drosophila germline tissues. bioRxiv, ver.4 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/2023.05.27.542554

Identification and quantification of transposable element transcripts using Long-Read RNA-seq in Drosophila germline tissuesRita Rebollo, Pierre Gerenton, Eric Cumunel, Arnaud Mary, François Sabot, Nelly Burlet, Benjamin Gillet, Sandrine Hughes, Daniel Siqueira Oliveira, Clément Goubert, Marie Fablet, Cristina Vieira, Vincent Lacroix<p>Transposable elements (TEs) are repeated DNA sequences potentially able to move throughout the genome. In addition to their inherent mutagenic effects, TEs can disrupt nearby genes by donating their intrinsic regulatory sequences, for instance,...Arthropods, Bioinformatics, Viruses and transposable elementsNicolas Pollet2023-06-13 14:46:20 View