PCI Genomics

371

Title *

MATEdb2, a collection of high-quality metazoan proteomes across the Animal Tree of Life to speed up phylogenomic studiesuse asterix (*) to get italics

Authors *

Gemma I. Martínez-Redondo, Carlos Vargas-Chávez, Klara Eleftheriadi, Lisandra Benítez-Álvarez, Marçal Vázquez-Valls, Rosa FernándezPlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"

Year *

2024

Picture *

Abstract *

<p>Recent advances in high throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, generating a new genome is not an easy task due to the high cost of genome sequencing, the high complexity of assembly, and the lack of standardized protocols for genome annotation. The lack of consensus in the annotation and publication of genome files hinders research by making researchers lose time in reformatting the files for their purposes but can also reduce the quality of the genetic repertoire for an evolutionary study. Thus, the use of transcriptomes obtained using the same pipeline as a proxy for the genetic content of species remains a valuable resource that is easier to obtain, cheaper, and more comparable than genomes. In a previous study, we presented the Metazoan Assemblies from Transcriptomic Ensembles database (MATEdb), a repository of high-quality transcriptomic and genomic data for the two most diverse animal phyla, Arthropoda and Mollusca. Here, we present the newest version of MATEdb (MATEdb2) that overcomes some of the previous limitations of our database: (1) we include data from all animal phyla where public data is available, (2) we provide gene annotations from genomes obtained using the same pipeline. In total, we provide proteomes inferred from high-quality transcriptomic or genomic data for almost 1000 animal species, including the longest isoforms, all isoforms, and functional annotation based on sequence homology and protein language models, as well as the embedding representations of the sequences. We believe this new version of MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open, greener, and collaborative science.</p>

Indicate the full web address (DOI or URL) giving public access to these data (if you have any problems with the deposit of your data, please contact contact@genomics.peercommunityin.org). In case all raw data are included in the preprint, indicate the DOI or URL of the preprint. *

https://github.com/MetazoaPhylogenomicsLab/MATEdb2You should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://

Indicate the full web address (DOI or URL) giving public access to these scripts (if you have any problems with the deposit of your scripts, please contact contact@genomics.peercommunityin.org). In case all raw scripts are included in the preprint, indicate the DOI or URL of the preprint. *

https://github.com/MetazoaPhylogenomicsLab/MATEdb2You should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://

Indicate the full web address (DOI, SWHID or URL) giving public access to these codes (if you have any problems with the deposit of your codes, please contact contact@genomics.peercommunityin.org). In case all raw codes are included in the preprint, indicate the DOI or URL of the preprint. *

https://github.com/MetazoaPhylogenomicsLab/MATEdb2You should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://

Keywords (optional)

genome assembly; Animal Tree of Life; transcriptomics; invertebrates

Methods that require specific expertise (optional)

NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).

Thematic fields *

Arthropods, Bioinformatics, Evolutionary genomics, Marine invertebrates, Terrestrial invertebrates

Suggested reviewers - Suggest up to 10 reviewers (provide names and Email addresses). (Optional)

Samuel Abalde, saabalde@gmail.com, Juan Opazo, jopazo@gmail.com, Gert Wörheide suggested: Dr. Sergio Vargas LMU München , Gert Wörheide suggested: sergio.vargas@lmu.de

e.g. John Doe john@doe.com

No need for them to be recommenders of PCI Genomics. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct

Opposed reviewers - Suggest up to 5 people not to invite as reviewers. (Optional)

e.g. John Doe john@doe.com

Submission date

2024-03-04 11:37:21

Recommender

Philipp Schiffer

Reviewers

or Register
Submit a preprint