Submit a preprint

200

MATEdb, a data repository of high-quality metazoan transcriptome assemblies to accelerate phylogenomic studiesuse asterix (*) to get italics
Rosa Fernandez, Vanina Tonzo, Carolina Simon Guerrero, Jesus Lozano-Fernandez, Gemma I Martinez-Redondo, Pau Balart-Garcia, Leandro Aristide, Klara Eleftheriadi, Carlos Vargas-ChavezPlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"
2022
<p style="text-align: justify;">With the advent of high throughput sequencing, the amount of genomic data available for animals (Metazoa) species has bloomed over the last decade, especially from transcriptomes due to lower sequencing costs and easier assembling process compared to genomes. Transcriptomic data sets have proven useful for phylogenomic studies, such as inference of phylogenetic interrelationships (e.g., species tree reconstruction) and comparative genomics analyses (e.g., gene repertoire evolutionary dynamics). However, these data sets are often analyzed following different analytical pipelines, particularly including different software versions, leading to potential methodological biases when analyzed jointly in a comparative framework. Moreover, these analyses are computationally expensive and not affordable for a large part of the scientific community. More importantly, assembled transcriptomes are usually not deposited in public databases. Furthermore, the quality of these data sets is hardly ever taken into consideration, potentially impacting subsequent analyses such as orthology and phylogenetic or gene repertoire evolution inference. To alleviate these issues, we present Metazoan Assemblies from Transcriptomic Ensembles (MATEdb), a curated database of 335 high-quality transcriptome assemblies from different animal phyla analyzed following the same pipeline. The repository is composed, for each species, of (1) a de novo transcriptome assembly, (2) its candidate coding regions within transcripts (both at the level of nucleotide and amino acid sequences), (3) the coding regions filtered using their contamination profile (i.e., only metazoan content), (4) the longest isoform of the amino acid candidate coding regions, (5) the gene content completeness score as assessed against the BUSCO database, and (6) an orthology-based gene annotation. We complement the repository with gene annotations from high-quality genomes, which are often not straightforward to obtain from individual sequencing projects, totalling 423 high-quality genomic and transcriptomic data sets. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, that will be analyzed following the same pipeline and be permanently stored in public repositories. We believe that MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open and collaborative science.&nbsp;</p>
You should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://
https://github.com/MetazoaPhylogenomicsLab/MATEdbYou should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://
https://github.com/MetazoaPhylogenomicsLab/MATEdbYou should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://
Transctiptome; Assembly; Invertebrates; non-model organisms; CDS; Annotation
NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).
Bioinformatics, Evolutionary genomics, Functional genomics
No need for them to be recommenders of PCI Genomics. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct
e.g. John Doe [john@doe.com]
2022-07-20 07:30:39
Samuel Abalde