Submit a preprint

Latest recommendationsrsstwitter

IdTitleAuthors▲AbstractPictureThematic fieldsRecommenderReviewersSubmission date
23 Sep 2022
article picture

MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies

MATEdb: a new phylogenomic-driven database for Metazoa

Recommended by ORCID_LOGO based on reviews by 2 anonymous reviewers

The development (and standardization) of high-throughput sequencing techniques has revolutionized evolutionary biology, to the point that we almost see as normal fine-detail studies of genome architecture evolution (Robert et al., 2022), adaptation to new habitats (Rahi et al., 2019), or the development of key evolutionary novelties (Hilgers et al., 2018), to name three examples. One of the fields that has benefited the most is phylogenomics, i.e. the use of genome-wide data for inferring the evolutionary relationships among organisms. Dealing with such amount of data, however, has come with important analytical and computational challenges. Likewise, although the steady generation of genomic data from virtually any organism opens exciting opportunities for comparative analyses, it also creates a sort of “information fog”, where it is hard to find the most appropriate and/or the higher quality data. I have personally experienced this not so long ago, when I had to spend several weeks selecting the most complete transcriptomes from several phyla, moving back and forth between the NCBI SRA repository and the relevant literature.

In an attempt to deal with this issue, some research labs have committed their time and resources to the generation of taxa- and topic-specific databases (Lathe et al., 2008), such as MolluscDB (Liu et al., 2021), focused on mollusk genomics, or EukProt (Richter et al., 2022), a protein repository representing the diversity of eukaryotes. A new database that promises to become an important resource in the near future is MATEdb (Fernández et al., 2022), a repository of high-quality genomic data from Metazoa. MATEdb has been developed from publicly available and newly generated transcriptomes and genomes, prioritizing quality over quantity. Upon download, the user has access to both raw data and the related datasets: assemblies, several quality metrics, the set of inferred protein-coding genes, and their annotation. Although it is clear to me that this repository has been created with phylogenomic analyses in mind, I see how it could be generalized to other related problems such as analyses of gene content or evolution of specific gene families. In my opinion, the main strengths of MATEdb are threefold:

  1. Rosa Fernández and her team have carefully scrutinized the genomic data available in several repositories to retrieve only the most complete transcriptomes and genomes, saving a lot of time in data mining to the user.
  2. These data have been analyzed to provide both the assembly and the set of protein-coding genes, easing the computational burden that usually accompanies these pipelines. Interestingly, all the data have been analyzed with the same software and parameters, facilitating comparisons among taxa.
  3. Genomic analysis can be intimidating, and even more for inexperienced users. That is particularly important when it comes to transcriptome and genome assembly because it has an effect in all downstream analyses. I believe that having access to already analyzed data softens this transition. The users can move forward on their research while they learn how to generate and analyze their data at their own pace.

On a negative note, I see two main drawbacks. First, as of today (September 16th, 2022) this database is in an early stage and it still needs to incorporate a lot of animal groups. This has been discussed during the revision process and the authors are already working on it, so it is only a matter of time until all major taxa are represented. Second, there is a scalability issue. In its current format it is not possible to select the taxa of interest and the full database has to be downloaded, which will become more and more difficult as it grows. Nonetheless, with the appropriate resources it would be easy to find a better solution. There are plenty of examples that could serve as inspiration, so I hope this does not become a big problem in the future.

Altogether, I and the researchers that participated in the revision process believe that MATEdb has the potential to become an important and valuable addition to the metazoan phylogenomics community. Personally, I wish it was available just a few months ago, it would have saved me so much time.

References

Fernández R, Tonzo V, Guerrero CS, Lozano-Fernandez J, Martínez-Redondo GI, Balart-García P, Aristide L, Eleftheriadi K, Vargas-Chávez C (2022) MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studies. bioRxiv, 2022.07.18.500182, ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.07.18.500182

Hilgers L, Hartmann S, Hofreiter M, von Rintelen T (2018) Novel Genes, Ancient Genes, and Gene Co-Option Contributed to the Genetic Basis of the Radula, a Molluscan Innovation. Molecular Biology and Evolution, 35, 1638–1652. https://doi.org/10.1093/molbev/msy052

Lathe W, Williams J, Mangan M, Karolchik, D (2008). Genomic data resources: challenges and promises. Nature Education, 1(3), 2.

Liu F, Li Y, Yu H, Zhang L, Hu J, Bao Z, Wang S (2021) MolluscDB: an integrated functional and evolutionary genomics database for the hyper-diverse animal phylum Mollusca. Nucleic Acids Research, 49, D988–D997. https://doi.org/10.1093/nar/gkaa918

Rahi ML, Mather PB, Ezaz T, Hurwood DA (2019) The Molecular Basis of Freshwater Adaptation in Prawns: Insights from Comparative Transcriptomics of Three Macrobrachium Species. Genome Biology and Evolution, 11, 1002–1018. https://doi.org/10.1093/gbe/evz045

Richter DJ, Berney C, Strassert JFH, Poh Y-P, Herman EK, Muñoz-Gómez SA, Wideman JG, Burki F, Vargas C de (2022) EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes. bioRxiv, 2020.06.30.180687, ver. 5 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2020.06.30.180687

Robert NSM, Sarigol F, Zimmermann B, Meyer A, Voolstra CR, Simakov O (2022) Emergence of distinct syntenic density regimes is associated with early metazoan genomic transitions. BMC Genomics, 23, 143. https://doi.org/10.1186/s12864-022-08304-2

MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studiesRosa Fernandez, Vanina Tonzo, Carolina Simon Guerrero, Jesus Lozano-Fernandez, Gemma I Martinez-Redondo, Pau Balart-Garcia, Leandro Aristide, Klara Eleftheriadi, Carlos Vargas-Chavez<p style="text-align: justify;">With the advent of high throughput sequencing, the amount of genomic data available for animals (Metazoa) species has bloomed over the last decade, especially from transcriptomes due to lower sequencing costs and ea...Bioinformatics, Evolutionary genomics, Functional genomicsSamuel Abalde2022-07-20 07:30:39 View
06 Feb 2024
article picture

The need of decoding life for taking care of biodiversity and the sustainable use of nature in the Anthropocene - a Faroese perspective

Why sequence everything? A raison d’être for the Genome Atlas of Faroese Ecology

Recommended by ORCID_LOGO based on reviews by Tereza Manousaki and 1 anonymous reviewer

When discussing the Earth BioGenome Project with scientists and potential funding agencies, one common question is: why sequence everything? Whether sequencing a subset would be more optimal is not an unreasonable question given what we know about the mathematics of importance and Pareto’s 80:20 principle, that 80% of the benefits can come from 20% of the effort. However, one must remember that this principle is an observation made in hindsight and selecting the most effective 20% of experiments is difficult. As an example, few saw great applied value in comparative genomic analysis of the archaea Haloferax mediterranei, but this enabled the discovery of CRISPR/Cas9 technology (1). When discussing whether or not to sequence all life on our planet, smaller countries such as the Faroe Islands are seldom mentioned. 
 
Mikalsen and co-authors (2) provide strong arguments to appreciate, investigate and steward genetic diversity, from a Faroese viewpoint, a fishery viewpoint, and a global viewpoint. As readers, we learn to cherish the Faroe Islands, the Faroese, and perhaps by extension all of nature and the people of the world. The manuscript describes the proposed Faroese participation in the European Reference Genome Atlas (ERGA) consortium through Gen@FarE – the Genome Atlas of Faroese Ecology. Gen@FarE aims to: i) generate high-quality reference genomes for all eukaryotes on the islands and in its waters; ii) establish population genetics of all species of commercial or ecological interest; and iii) establish a “databank” for all Faroese species with citizen science tools for participation.


In the background section of the manuscript, the authors argue that as caretakers of the earth (and responsible for the current rapid decrease in biodiversity), humanity must be aware of the biodiversity and existing genetic diversity, to protect these for future generations. Thus, it is necessary to have reference genomes for as many species as possible, enabling estimation of population sizes and gene flow between ecosystem locations. Without this the authors note that “…it is impossible to make relevant management plans for a species, an ecosystem or a geographical area…”. Gen@FarE is important. The Faroe nation has a sizable economic zone in the North Atlantic and large fisheries. In terms of biodiversity and conservation, the authors list some species endemic to other Faroe islands, especially sea birds. The article discusses ongoing marine environmental-DNA-based monitoring programs that started in 2018, and how new reference genome databases will help these efforts to track and preserve marine biodiversity. They point to the lack of use of population genomics information for Red List decisions on which species are endangered, and the need for these techniques to inform sustainable harvesting of fisheries, given collapses in critical food species such as Northwest Atlantic cod and herring. In one example, they highlight how the herring chromosome 12 inversion contains a “supergene” collection of tightly linked genes associated with ecological adaptation. Genetic tools may also help enable the identification and nurturing of feeding grounds for young individuals. Critically, the Faroe Islands have a significant role to play in protecting the millions of tons of seafood caught annually upon which humanity relies. As the authors note, population genomics based on high-quality reference sequences is “likely the best tool” to monitor and protect commercial fisheries. There is an important section discussing the role of interactions between visible and “invisible" species in the marine ecosystem on which we all depend. Examples of “invisible” species include a wide range of morphologically similar planktonic algae, and invasive species transported by ballast water or ship hulls.​ As biologists, I believe we forget that our population studies of life on the earth have so far been mostly in the dark. Gen@FarE is but one light that can be switched on. 


The authors conclude by discussing Gen@FarE plans for citizen science and education, perhaps the most important part of this project if humanity is to learn to cherish and care for the earth. Where initiatives such as the Human Genome Project did not need the collaborative efforts of the world for sample access, the Earth BioGenome Project most certainly does. In the same way, at a smaller scale, Gen@FarE requires the support and determination of the Faroese. 
 


References    

1          Mojica, F. J., Díez-Villaseñor, C. S., García-Martínez, J. & Soria, E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60, 174-182 (2005).

2          Mikalsen, S-O., Hjøllum, J. í., Salter, I., Djurhuus, A. & Kongsstovu, S. í. The need of decoding life for taking care of biodiversity and the sustainable use of nature in the Anthropocene – a Faroese perspective. EcoEvoRxiv (2024), ver. 3 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.32942/X21S4C

The need of decoding life for taking care of biodiversity and the sustainable use of nature in the Anthropocene - a Faroese perspectiveSvein-Ole Mikalsen, Jari í Hjøllum, Ian Salter, Anni Djurhuus, Sunnvør í Kongsstovu<p>Biodiversity is under pressure, mainly due to human activities and climate change. At the international policy level, it is now recognised that genetic diversity is an important part of biodiversity. The availability of high-quality reference g...ERGA, ERGA Pilot, Population genomics, VertebratesStephen Richards2023-07-31 16:59:33 View
08 Nov 2022
article picture

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks

How to best call the somatic mosaic tree?

Recommended by based on reviews by 2 anonymous reviewers

Any multicellular organism is a molecular mosaic with some somatic mutations accumulated between cell lineages. Big long-lived trees have nourished this imaginary of a somatic mosaic tree, from the observation of spectacular phenotypic mosaics and also because somatic mutations are expected to potentially be passed on to gametes in plants (review in Schoen and Schultz 2019). The lower cost of genome sequencing now offers the opportunity to tackle the issue and identify somatic mutations in trees.

However, when it comes to characterizing this somatic mosaic from genome sequences, things become much more difficult than one would think in the first place. What separates cell lineages ontogenetically, in cell division number, or in time? How to sample clonal cell populations? How do somatic mutations distribute in a population of cells in an organ or an organ sample? Should they be fixed heterozygotes in the sample of cells sequenced or be polymorphic? Do we indeed expect somatic mutations to be fixed? How should we identify and count somatic mutations?

To date, the detection of somatic mutations has mostly been done with a single variant caller in a given study, and we have little perspective on how different callers provide similar or different results. Some studies have used standard SNP callers that assumed a somatic mutation is fixed at the heterozygous state in the sample of cells, with an expected allele coverage ratio of 0.5, and less have used cancer callers, designed to detect mutations in a fraction of the cells in the sample. However, standard SNP callers detect mutations that deviate from a balanced allelic coverage, and different cancer callers can have different characteristics that should affect their outcomes.

In order to tackle these issues, Schmitt et al. (2022) conducted an extensive simulation analysis to compare different variant callers. Then, they reanalyzed two large published datasets on pedunculate oak, Quercus robur.  The analysis of in silico somatic mutations allowed the authors to evaluate the performance of different variant callers as a function of the allelic fraction of somatic mutations and the sequencing depth. They found one of the seven callers to provide better and more robust calls for a broad set of allelic fractions and sequencing depths. The reanalysis of published datasets in oaks with the most effective cancer caller of the in silico analysis allowed them to identify numerous low-frequency mutations that were missed in the original studies.

I recommend the study of Schmitt et al. (2022) first because it shows the benefit of using cancer callers in the study of somatic mutations, whatever the allelic fraction you are interested in at the end. You can select fixed heterozygotes if this is your ultimate target, but cancer callers allow you to have in addition a valuable overview of the allelic fractions of somatic mutations in your sample, and most do as well as SNP callers for fixed heterozygous mutations. In addition, Schmitt et al. (2022) provide the pipelines that allow investigating in silico data that should correspond to a given study design, encouraging to compare different variant callers rather than arbitrarily going with only one. We can anticipate that the study of somatic mutations in non-model species will increasingly attract attention now that multiple tissues of the same individual can be sequenced at low cost, and the study of Schmitt et al. (2022) paves the way for questioning and choosing the best variant caller for the question one wants to address.

References

Schoen DJ, Schultz ST (2019) Somatic Mutation and Evolution in Plants. Annual Review of Ecology, Evolution, and Systematics, 50, 49–73. https://doi.org/10.1146/annurev-ecolsys-110218-024955

Schmitt S, Leroy T, Heuertz M, Tysklind N (2022) Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaks. bioRxiv, 2021.10.11.462798. ver. 4 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2021.10.11.462798

Somatic mutation detection: a critical evaluation through simulations and reanalyses in oaksSylvain Schmitt, Thibault Leroy, Myriam Heuertz, Niklas Tysklind<p style="text-align: justify;">1. Mutation, the source of genetic diversity, is the raw material of evolution; however, the mutation process remains understudied, especially in plants. Using both a simulation and reanalysis framework, we set out ...Bioinformatics, PlantsNicolas BierneAnonymous, Anonymous2022-04-28 13:24:19 View
11 Mar 2021
article picture

Gut microbial ecology of Xenopus tadpoles across life stages

A comprehensive look at Xenopus gut microbiota: effects of feed, developmental stages and parental transmission

Recommended by based on reviews by Vanessa Marcelino and 1 anonymous reviewer

It is well established that the gut microbiota play an important role in the overall health of their hosts (Jandhyala et al. 2015). To date, there are still a limited number of studies on the complex microbial communites inhabiting vertebrate digestive systems, especially the ones that also explored the functional diversity of the microbial community (Bletz et al. 2016).

This preprint by Scalvenzi et al. (2021) reports a comprehensive study on the phylogenetic and metabolic profiles of the Xenopus gut microbiota. The author describes significant changes in the gut microbiome communities at different developmental stages and demonstrates different microbial community composition across organs. In addition, the study also investigates the impact of diet on the Xenopus tadpole gut microbiome communities as well as how the bacterial communities are transmitted from parents to the next generation.

This is one of the first studies that addresses the interactions between gut bacteria and tadpoles during the development. The authors observe the dynamics of gut microbiome communities during tadpole growth and metamorphosis. They also explore host-gut microbial community metabolic interactions and demostrate the capacity of the microbiome to complement the metabolic pathways of the Xenopus genome. Although this study is limited by the use of Xenopus tadpoles in a laboratory, which are probably different from those in nature, I believe it still provides important and valuable information for the research community working on vertebrate’s microbiota and their interaction with the host. 

References

Bletz et al. (2016). Amphibian gut microbiota shifts differentially in community structure but converges on habitat-specific predicted functions. Nature Communications, 7(1), 1-12. doi: https://doi.org/10.1038/ncomms13699

Jandhyala, S. M., Talukdar, R., Subramanyam, C., Vuyyuru, H., Sasikala, M., & Reddy, D. N. (2015). Role of the normal gut microbiota. World journal of gastroenterology: WJG, 21(29), 8787. doi: https://dx.doi.org/10.3748%2Fwjg.v21.i29.8787

Scalvenzi, T., Clavereau, I., Bourge, M. & Pollet, N. (2021) Gut microbial ecology of Xenopus tadpoles across life stages. bioRxiv, 2020.05.25.110734, ver. 4 peer-reviewed and recommended by Peer community in Geonmics. https://doi.org/10.1101/2020.05.25.110734

Gut microbial ecology of Xenopus tadpoles across life stagesThibault Scalvenzi, Isabelle Clavereau, Mickael Bourge, Nicolas Pollet<p><strong>Background</strong> The microorganism world living in amphibians is still largely under-represented and under-studied in the literature. Among anuran amphibians, African clawed frogs of the Xenopus genus stand as well-characterized mode...Evolutionary genomics, Metagenomics, VertebratesWirulda Pootakham2020-05-25 14:01:19 View