Recommendation

“Phylogenetics in the Genomic Era” brings together experts in the field to present a comprehensive synthesis

Robert Waterhouse and Karen Meusemann

A recommendation of:

Phylogenetics in the Genomic Era

Céline Scornavacca, Frédéric Delsuc, Nicolas Galtier (2021), HAL, PGE https://hal.inria.fr/PGE/

Read article in journal

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Phylogenetics in the Genomic Era

Molecular phylogenetics was born in the middle of the 20th century, when the advent of protein and DNA sequencing offered a novel way to study the evolutionary relationships between living organisms. The first 50 years of the discipline can be seen as a long quest for resolving power. The goal – reconstructing the tree of life – seemed to be unreachable, the methods were heavily debated, and the data limiting. Maybe for these reasons, even the relevance of the whole approach was repeatedly questioned, as part of the so-called molecules versus morphology debate. Controversies often crystalized around long-standing conundrums, such as the origin of land plants, the diversification of placental mammals, or the prokaryote/eukaryote divide. Some of these questions were resolved as gene and species samples increased in size. Over the years, molecular phylogenetics has gradually evolved from a brilliant, revolutionary idea to a mature research field centred on the problem of reliably building trees.

This logical progression was abruptly interrupted in the late 2000s. High-throughput sequencing arose and the field suddenly moved into something entirely different. Access to genome-scale data profoundly reshaped the methodological challenges, while opening an amazing range of new application perspectives. Phylogenetics left the realm of systematics to occupy a central place in one of the most exciting research fields of this century – genomics. This is what this book is about: how we do trees, and what we do with trees, in the current phylogenomic era.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

علم الوراثة في عصر الجينوم

وُلد علم الوراثة الجزيئي في منتصف القرن العشرين، عندما قدم ظهور البروتين وتسلسل الحمض النووي طريقة جديدة لدراسة العلاقات التطورية بين الكائنات الحية. ويمكن النظر إلى السنوات الخمسين الأولى من الانضباط على أنها سعي طويل لحل مشكلة السلطة. بدا الهدف - إعادة بناء شجرة الحياة - بعيد المنال، وكانت الأساليب محل نقاش حاد، وكانت البيانات محدودة. ربما لهذه الأسباب، تم التشكيك مرارًا وتكرارًا في أهمية النهج بأكمله، كجزء مما يسمى بمناقشة الجزيئات مقابل الشكل. غالبًا ما تتبلور الخلافات حول ألغاز طويلة الأمد، مثل أصل النباتات البرية، أو تنوع الثدييات المشيمية، أو الانقسام بين بدائيات النوى وحقيقيات النوى. تم حل بعض هذه الأسئلة مع زيادة حجم عينات الجينات والأنواع. على مر السنين، تطور علم الوراثة الجزيئي تدريجيًا من فكرة رائعة وثورية إلى مجال بحث ناضج يتمحور حول مشكلة بناء الأشجار بشكل موثوق.

تمت مقاطعة هذا التقدم المنطقي فجأة في أواخر العقد الأول من القرن الحادي والعشرين. نشأ التسلسل عالي الإنتاجية وانتقل المجال فجأة إلى شيء مختلف تمامًا. أدى الوصول إلى البيانات على نطاق الجينوم إلى إعادة تشكيل التحديات المنهجية بشكل عميق، مع فتح مجموعة مذهلة من وجهات نظر التطبيقات الجديدة. ترك علم الوراثة التطوري عالم علم اللاهوت النظامي ليحتل مكانًا مركزيًا في أحد أكثر مجالات البحث إثارة في هذا القرن - علم الجينوم. هذا هو موضوع هذا الكتاب: كيف نتعامل مع الأشجار، وماذا نفعل بالأشجار، في عصر النشوء والتطور الحالي.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Filogenética en la era genómica

La filogenética molecular nació a mediados del siglo XX, cuando la llegada de la secuenciación de proteínas y ADN ofreció una forma novedosa de estudiar las relaciones evolutivas entre los organismos vivos. Los primeros 50 años de la disciplina pueden verse como una larga búsqueda de poder de resolución. El objetivo –reconstruir el árbol de la vida– parecía inalcanzable, los métodos fueron muy debatidos y los datos limitados. Quizás por estas razones, incluso la relevancia de todo el enfoque fue cuestionada repetidamente, como parte del llamado debate moléculas versus morfología. Las controversias a menudo cristalizaron en torno a enigmas de larga data, como el origen de las plantas terrestres, la diversificación de los mamíferos placentarios o la división procariota/eucariota. Algunas de estas preguntas se resolvieron a medida que las muestras de genes y especies aumentaron de tamaño. A lo largo de los años, la filogenética molecular ha evolucionado gradualmente desde una idea brillante y revolucionaria hasta un campo de investigación maduro centrado en el problema de la construcción confiable de árboles.

Esta progresión lógica se interrumpió abruptamente a finales de la década de 2000. Surgió la secuenciación de alto rendimiento y el campo de repente pasó a algo completamente diferente. El acceso a datos a escala del genoma reformuló profundamente los desafíos metodológicos, al tiempo que abrió una asombrosa gama de nuevas perspectivas de aplicaciones. La filogenética abandonó el ámbito de la sistemática para ocupar un lugar central en uno de los campos de investigación más apasionantes de este siglo: la genómica. De esto trata este libro: cómo hacemos árboles y qué hacemos con los árboles en la era filogenómica actual.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

La phylogénétique à l'ère de la génomique

La phylogénétique moléculaire est née au milieu du XXe siècle, lorsque l'avènement du séquençage des protéines et de l'ADN a offert une nouvelle façon d'étudier les relations évolutives entre les organismes vivants. Les 50 premières années de la discipline peuvent être considérées comme une longue quête du pouvoir de résolution. L’objectif – reconstruire l’arbre de vie – semblait inaccessible, les méthodes étaient très controversées et les données limitées. C’est peut-être pour ces raisons que même la pertinence de l’approche dans son ensemble a été remise en question à plusieurs reprises, dans le cadre du débat entre molécules et morphologie. Les controverses se sont souvent cristallisées autour d’énigmes de longue date, telles que l’origine des plantes terrestres, la diversification des mammifères placentaires ou la division procaryotes/eucaryotes. Certaines de ces questions ont été résolues à mesure que la taille des échantillons de gènes et d’espèces augmentait. Au fil des années, la phylogénétique moléculaire a progressivement évolué d'une idée brillante et révolutionnaire à un domaine de recherche mature centré sur le problème de la construction fiable d'arbres.

Cette progression logique a été brusquement interrompue à la fin des années 2000. Le séquençage à haut débit est apparu et le domaine est soudainement passé à quelque chose de complètement différent. L’accès aux données à l’échelle du génome a profondément remodelé les défis méthodologiques, tout en ouvrant une gamme étonnante de nouvelles perspectives d’application. La phylogénétique a quitté le domaine de la systématique pour occuper une place centrale dans l’un des domaines de recherche les plus passionnants de ce siècle : la génomique. C'est le sujet de ce livre : comment nous faisons des arbres, et ce que nous faisons avec les arbres, dans l'ère phylogénomique actuelle.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

जीनोमिक युग में फाइलोजेनेटिक्स

आणविक फाइलोजेनेटिक्स का जन्म 20वीं सदी के मध्य में हुआ, जब प्रोटीन और डीएनए अनुक्रमण के आगमन ने जीवित जीवों के बीच विकासवादी संबंधों का अध्ययन करने का एक नया तरीका पेश किया। अनुशासन के पहले 50 वर्षों को संकल्प शक्ति की लंबी खोज के रूप में देखा जा सकता है। लक्ष्य - जीवन के वृक्ष का पुनर्निर्माण - अप्राप्य लग रहा था, तरीकों पर भारी बहस हुई और डेटा सीमित हो गया। शायद इन्हीं कारणों से, तथाकथित अणु बनाम आकृति विज्ञान बहस के हिस्से के रूप में, पूरे दृष्टिकोण की प्रासंगिकता पर भी बार-बार सवाल उठाया गया था। विवाद अक्सर लंबे समय से चली आ रही पहेलियों के आसपास खड़े हो जाते हैं, जैसे भूमि पौधों की उत्पत्ति, प्लेसेंटल स्तनधारियों का विविधीकरण, या प्रोकैरियोट/यूकेरियोट विभाजन। इनमें से कुछ प्रश्न हल हो गए क्योंकि जीन और प्रजातियों के नमूनों का आकार बढ़ गया। इन वर्षों में, आणविक फ़ाइलोजेनेटिक्स धीरे-धीरे एक शानदार, क्रांतिकारी विचार से पेड़ों के विश्वसनीय निर्माण की समस्या पर केंद्रित एक परिपक्व अनुसंधान क्षेत्र में विकसित हुआ है।

यह तार्किक प्रगति 2000 के दशक के अंत में अचानक बाधित हो गई थी। उच्च-थ्रूपुट अनुक्रम उत्पन्न हुआ और क्षेत्र अचानक पूरी तरह से अलग हो गया। जीनोम-स्केल डेटा तक पहुंच ने नए अनुप्रयोग दृष्टिकोणों की एक अद्भुत श्रृंखला खोलते हुए, पद्धतिगत चुनौतियों को गहराई से नया आकार दिया। फाइलोजेनेटिक्स ने सिस्टमैटिक्स के दायरे को छोड़कर इस सदी के सबसे रोमांचक शोध क्षेत्रों में से एक - जीनोमिक्स - में केंद्रीय स्थान हासिल कर लिया है। यह पुस्तक इसी बारे में है: वर्तमान फ़ाइलोजेनोमिक युग में हम पेड़ों के साथ क्या करते हैं, और हम पेड़ों के साथ क्या करते हैं।

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

ゲノム時代の系統発生学

分子系統学は 20 世紀半ばに誕生しました。このとき、タンパク質と DNA の配列決定の出現により、生物間の進化の関係を研究する新しい方法が提供されました。この分野の最初の 50 年間は、解決力を求める長い探求の年であったと言えます。生命の樹を再構築するという目標は達成不可能であるように見え、その方法は激しく議論され、データは限られていました。おそらくこれらの理由から、いわゆる分子対形態学の議論の一環として、アプローチ全体の関連性さえ繰り返し疑問視されました。陸上植物の起源、有胎盤哺乳類の多様化、原核生物と真核生物の分裂など、長年の難題をめぐって論争が結晶化することがよくありました。これらの疑問の一部は、遺伝子と種のサンプルのサイズが増加するにつれて解決されました。長年にわたり、分子系統学は、優れた革新的なアイデアから、確実に樹木を構築するという問題を中心とした成熟した研究分野へと徐々に進化してきました。

この論理的な進行は、2000 年代後半に突然中断されました。ハイスループットシークエンシングが登場し、この分野は突然まったく異なるものに移行しました。ゲノムスケールのデータへのアクセスにより、方法論上の課題が大幅に再形成されると同時に、驚くべき範囲の新しいアプリケーションの展望が開かれます。系統発生学は系統学の領域を離れ、今世紀で最も刺激的な研究分野の 1 つであるゲノミクスの中心的な位置を占めました。これがこの本の主題です。現在の系統学的時代において、私たちがどのように樹木を育て、そして樹木に対して何をしているのかについてです。

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Filogenética na Era Genômica

A filogenética molecular nasceu em meados do século 20, quando o advento do sequenciamento de proteínas e DNA ofereceu uma nova maneira de estudar as relações evolutivas entre os organismos vivos. Os primeiros 50 anos da disciplina podem ser vistos como uma longa busca pelo poder de resolução. O objectivo – reconstruir a árvore da vida – parecia inalcançável, os métodos eram fortemente debatidos e os dados eram limitados. Talvez por estas razões, até a relevância de toda a abordagem tenha sido repetidamente questionada, como parte do chamado debate moléculas versus morfologia. As controvérsias muitas vezes se cristalizavam em torno de enigmas de longa data, como a origem das plantas terrestres, a diversificação dos mamíferos placentários ou a divisão procariontes/eucariotos. Algumas dessas questões foram resolvidas à medida que as amostras de genes e espécies aumentaram de tamanho. Ao longo dos anos, a filogenética molecular evoluiu gradualmente de uma ideia brilhante e revolucionária para um campo de pesquisa maduro centrado no problema da construção confiável de árvores.

Essa progressão lógica foi abruptamente interrompida no final dos anos 2000. Surgiu o sequenciamento de alto rendimento e o campo de repente mudou para algo totalmente diferente. O acesso a dados à escala do genoma remodelou profundamente os desafios metodológicos, ao mesmo tempo que abriu uma gama surpreendente de novas perspectivas de aplicação. A filogenética deixou o domínio da sistemática para ocupar um lugar central num dos campos de investigação mais interessantes deste século – a genómica. É disso que trata este livro: como fazemos árvores e o que fazemos com árvores na atual era filogenômica.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Филогенетика в эпоху генома

Молекулярная филогенетика зародилась в середине 20 века, когда появление секвенирования белков и ДНК открыло новый способ изучения эволюционных взаимоотношений между живыми организмами. Первые 50 лет существования этой дисциплины можно рассматривать как долгий поиск решения проблемы власти. Цель – реконструкция древа жизни – казалась недостижимой, методы активно обсуждались, а данные были ограничены. Возможно, по этим причинам даже актуальность всего подхода неоднократно подвергалась сомнению в рамках так называемых дебатов о молекулах и морфологии. Споры часто кристаллизовались вокруг давних загадок, таких как происхождение наземных растений, разнообразие плацентарных млекопитающих или разделение прокариотов и эукариотов. Некоторые из этих вопросов были решены по мере увеличения размера образцов генов и видов. За прошедшие годы молекулярная филогенетика постепенно превратилась из блестящей революционной идеи в зрелую область исследований, сосредоточенную на проблеме надежного построения деревьев.

Это логическое развитие было внезапно прервано в конце 2000-х годов. Возникло высокопроизводительное секвенирование, и эта область внезапно перешла в нечто совершенно иное. Доступ к данным в масштабе генома глубоко изменил методологические проблемы, одновременно открыв удивительный спектр новых перспектив применения. Филогенетика вышла из сферы систематики и заняла центральное место в одной из самых интересных областей исследований этого столетия – геномике. Вот о чем эта книга: как мы создаем деревья и что мы делаем с деревьями в современную филогеномную эпоху.

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

基因组时代的系统发育学

分子系统发育学诞生于 20 世纪中叶，当时蛋白质和 DNA 测序的出现为研究生物体之间的进化关系提供了一种新方法。该学科的前 50 年可以被视为对解决能力的长期追求。重建生命之树的目标似乎遥不可及，方法受到激烈争论，而且数据有限。也许由于这些原因，甚至整个方法的相关性也被反复质疑，作为所谓分子与形态学辩论的一部分。争议常常围绕长期存在的难题而展开，例如陆地植物的起源、胎盘哺乳动物的多样化或原核生物/真核生物的分歧。随着基因和物种样本规模的增加，其中一些问题得到了解决。多年来，分子系统发育学已逐渐从一个辉煌的、革命性的想法发展成为一个以可靠构建树问题为中心的成熟研究领域。

这种逻辑进程在 2000 年代末突然中断。高通量测序出现了，这个领域突然进入了完全不同的领域。获取基因组规模的数据深刻地改变了方法学挑战，同时开辟了一系列令人惊叹的新应用视角。系统发育学离开了系统学领域，在本世纪最令人兴奋的研究领域之一——基因组学中占据了中心地位。这就是本书的主题：在当前的系统发育时代，我们如何研究树木，以及我们如何利用树木。

Recommendation: posted 08 April 2022, validated 08 April 2022

Cite this recommendation as:
Waterhouse, R. and Meusemann, K. (2022) “Phylogenetics in the Genomic Era” brings together experts in the field to present a comprehensive synthesis. Peer Community in Genomics, 100015. https://doi.org/10.24072/pci.genomics.100015

Recommendation

E-book: Phylogenetics in the Genomic Era (Scornavacca et al. 2021)

This book was not peer-reviewed by PCI Genomics. It has undergone an internal review by the editors.

Accurate reconstructions of the relationships amongst species and the genes encoded in their genomes are an essential foundation for almost all evolutionary inferences emerging from downstream analyses. Molecular phylogenetics has developed as a field over many decades to build suites of models and methods to reconstruct reliable trees that explain, support, or refute such inferences. The genomic era has brought new challenges and opportunities to the field, opening up new areas of research and algorithm development to take advantage of the accumulating large-scale data. Such ‘big-data’ phylogenetics has come to be known as phylogenomics, which broadly aims to connect molecular and evolutionary biology research to address questions centred on relationships amongst taxa, mechanisms of molecular evolution, and the biological functions of genes and other genomic elements. This book brings together experts in the field to present a comprehensive synthesis of Phylogenetics in the Genomic Era, covering key conceptual and methodological aspects of how to build accurate phylogenies and how to apply them in molecular and evolutionary research. The paragraphs below briefly summarise the five constituent parts of the book, highlighting the key concepts, methods, and applications that each part addresses. Being organised in an accessible style, while presenting details to provide depth where necessary, and including guides describing real-world examples of major phylogenomic tools, this collection represents an invaluable resource, particularly for students and newcomers to the field of phylogenomics.

Part 1: Phylogenetic analyses in the genomic era

Modelling how sequences evolve is a fundamental cornerstone of phylogenetic reconstructions. This part of the book introduces the reader to phylogenetic inference methods and algorithmic optimisations in the contexts of Markov, Maximum Likelihood, and Bayesian models of sequence evolution. The main concepts and theoretical considerations are mapped out for probabilistic Markov models, efficient tree building with Maximum Likelihood methods, and the flexibility and robustness of Bayesian approaches. These are supported with practical examples of phylogenomic applications using the popular tools RAxML and PhyloBayes. By considering theoretical, algorithmic, and practical aspects, these chapters provide readers with a holistic overview of the challenges and recent advances in developing scalable phylogenetic analyses in the genomic era.

Part 2: Data quality, model adequacy

This part focuses on the importance of considering the appropriateness of the evolutionary models used and the accuracy of the underlying molecular and genomic data. Both these aspects can profoundly affect the results when applying current phylogenomic methods to make inferences about complex biological and evolutionary processes. A clear example is presented for methods for building multiple sequence alignments and subsequent filtering approaches that can greatly impact phylogeny inference. The importance of error detection in (meta)barcode sequencing data is also highlighted, with solutions offered by the MACSE_BARCODE pipeline for accurate taxonomic assignments. Orthology datasets are essential markers for phylogenomic inferences, but the overview of concepts and methods presented shows that they too face challenges with respect to model selection and data quality. Finally, an innovative approach using ancestral gene order reconstructions provides new perspectives on how to assess gene tree accuracy for phylogenomic analyses. By emphasising through examples the importance of using appropriate evolutionary models and assessing input data quality, these chapters alert readers to key limitations that the field as a whole strives to address.

Part 3: Resolving phylogenomic conflicts

Conflicting phylogenetic signals are commonplace and may derive from statistical or systematic bias. This part of the book addresses possible causes of conflict, discordance between gene trees and species trees and how processes that lead to such conflicts can be described by phylogenetic models. Furthermore, it provides an overview of various models and methods with examples in phylogenomics including their pros and cons. Outlined in detail is the multispecies coalescent model (MSC) and its applications in phylogenomics. An interesting aspect is that different phylogenetic signals leading to conflict are in fact a key source of information rather than a problem that can – and should – be used to point to events like introgression or hybridisation, highlighting possible future trends in this research area. Last but not least, this part of the book also addresses inferring species trees by concatenating single multiple sequence alignments (gene alignments) versus inferring the species tree based on ensembles of single gene trees pointing out advantages and disadvantages of both approaches. As an important take home message from these chapters, it is recommended to be flexible and identify the most appropriate approach for each dataset to be analysed since this may tremendously differ depending on the dataset, setting, taxa, and phylogenetic level addressed by the researcher.

Part 4: Functional evolutionary genomics

In this part of the book the focus shifts to functional considerations of phylogenomics approaches both in terms of molecular evolution and adaptation and with respect to gene expression. The utility of multi-species analysis is clearly presented in the context of annotating functional genomic elements through quantifying evolutionary constraint and protein-coding potential. An historical perspective on characterising rates of change highlights how phylogenomic datasets help to understand the modes of molecular evolution across the genome, over time, and between lineages. These are contextualised with respect to the specific aim of detecting signatures of adaptation from protein-coding DNA alignments using the example of the MutSelDP-ω∗ model. This is extended with the presentation of the generally rare case of adaptive sequence convergence, where consideration of appropriate models and knowledge of gene functions and phenotypic effects are needed. Constrained or relaxed, selection pressures on sequence or copy-number affect genomic elements in different ways, making the very concept of function difficult to pin down despite it being fundamental to relate the genome to the phenotype and organismal fitness. Here gene expression provides a measurable intermediate, for which the Expression Comparison tool from the Bgee suite allows exploration of expression patterns across multiple animal species taking into account anatomical homology. Overall, phylogenomics applications in functional evolutionary genomics build on a rich theoretical history from molecular analyses where integration with knowledge of gene functions is challenging but critical.

Part 5: Phylogenomic applications

Rather than attempting to review the full extent of applications linked to phylogenomics, this part of the book focuses on providing detailed specific insights into selected examples and methods concerning i) estimating divergence times, and ii) species delimitation in the era of ‘omics’ data. With respect to estimating divergence times, an exemplary overview is provided for fossil data recovered from geological records, either using fossil data as calibration points with an extant-species-inferred phylogeny, or using a fossilised birth-death process as a mechanistic model that accounts for lineage diversification. Included is a tutorial for a joint approach to infer phylogenies and estimate divergence times using the RevBayes software with various models implemented for different applications and datasets incorporating molecular and morphological data. An interesting excursion is outlined focusing on timescale estimates with respect to viral evolution introducing BEAGLE, a high-performance likelihood-calculation platform that can be used on multi-core systems. As a second major subject, species delimitation is addressed since currently the increasing amount of available genomic data enables extensive inferences, for instance about the degree of genetic isolation among species and ancient and recent introgression events. Describing the history of molecular species delimitation up to the current genomic era and presenting widely used computational methods incorporating single- and multi-locus genomic data, pros and cons are addressed. Finally, a proposal for a new method for delimiting species based on empirical criteria is outlined. In the closing chapter of this part of the book, BPP (Bayesian Markov chain Monte Carlo program) for analysing multi-locus sequence data under the multispecies coalescent (MSC) model with and without introgression is introduced, including a tutorial. These examples together provide accessible details on key conceptual and methodological aspects related to the application of phylogenetics in the genomic era.

References

Scornavacca C, Delsuc F, Galtier N (2021) Phylogenetics in the Genomic Era. https://hal.inria.fr/PGE/

PDF recommendation

User comments

No user comments yet

or Register
Submit a preprint