Close printable page

Recommendation

How to interpret the inference of recombination landscapes on methods based on linkage disequilibrium?

Sebastian Ernesto Ramos-Onsins based on reviews by 2 anonymous reviewers

A recommendation of:

Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study

Marie Raynaud, Pierre-Alexandre Gagnaire, Nicolas Galtier (2023), bioRxiv, ver.2, peer-reviewed and recommended by PCI Genomics https://doi.org/10.1101/2022.03.30.486352

Read preprint in preprint server Now published in Peer Community Journal

Data used for results

Codes used in this study

Scripts used to obtain or analyze results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study

Knowledge of recombination rate variation along the genome provides important insights into genome and phenotypic evolution. Population genomic approaches offer an attractive way to infer the population-scaled recombination rate ⍴=4Ner using the linkage disequilibrium information contained in DNA sequence polymorphism data. Such methods have been used in a broad range of plant and animal species to build genome-wide recombination maps. However, the reliability of these inferences has only been assessed under a restrictive set of conditions. Here, we evaluate the ability of one of the most widely used coalescent-based programs, LDhelmet, to infer a genomic landscape of recombination with the biological characteristics of a human-like landscape including hotspots. Using simulations, we specifically assessed the impact of methodological (sample size, phasing errors, block penalty) and evolutionary parameters (effective population size (Ne), demographic history, mutation to recombination rate ratio) on inferred map quality. We report reasonably good correlations between simulated and inferred landscapes, but point to limitations when it comes to detecting recombination hotspots. False positive and false negative hotspots considerably confound fine-scale patterns of inferred recombination under a wide range of conditions, particularly when Ne is small and the mutation/recombination rate ratio is low, to the extent that maps inferred from populations sharing the same recombination landscape appear uncorrelated. We thus address a message of caution for the users of these approaches, at least for genomes with complex recombination landscapes such as in humans.

Population-scaled recombination rate, LDhelmet, simulations, linkage disequilibrium, recombination landscapes, recombination hotspots

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

الأداء والقيود المفروضة على الأساليب القائمة على اختلال التوازن لاستنتاج المشهد الجينومي لإعادة التركيب واكتشاف النقاط الساخنة: دراسة محاكاة

توفر معرفة تباين معدل إعادة التركيب على طول الجينوم رؤى مهمة حول تطور الجينوم والنمط الظاهري. توفر الأساليب الجينومية السكانية طريقة جذابة لاستنتاج معدل إعادة التركيب على نطاق السكان ⍴ = 4Ner باستخدام معلومات اختلال التوازن الموجودة في بيانات تعدد أشكال تسلسل الحمض النووي. وقد تم استخدام مثل هذه الأساليب في مجموعة واسعة من الأنواع النباتية والحيوانية لبناء خرائط إعادة التركيب على مستوى الجينوم. ومع ذلك، لم يتم تقييم موثوقية هذه الاستنتاجات إلا في ظل مجموعة مقيدة من الشروط. هنا، نقوم بتقييم قدرة أحد البرامج القائمة على الاندماج الأكثر استخدامًا، وهو LDhelmet، على استنتاج المشهد الجينومي لإعادة التركيب مع الخصائص البيولوجية للمناظر الطبيعية الشبيهة بالإنسان بما في ذلك النقاط الساخنة. باستخدام عمليات المحاكاة، قمنا على وجه التحديد بتقييم تأثير المنهجية (حجم العينة، وأخطاء الطور، وعقوبة الكتلة) والمعلمات التطورية (الحجم السكاني الفعال (Ne)، والتاريخ الديموغرافي، ونسبة الطفرة إلى معدل إعادة التركيب) على جودة الخريطة المستنتجة. لقد أبلغنا عن ارتباطات جيدة إلى حد معقول بين المناظر الطبيعية المحاكاة والمستنتجة، لكننا نشير إلى القيود عندما يتعلق الأمر باكتشاف النقاط الساخنة لإعادة التركيب. النقاط الساخنة الإيجابية الكاذبة والسلبية الكاذبة تربك إلى حد كبير الأنماط الدقيقة لإعادة التركيب المستنتج في ظل مجموعة واسعة من الظروف، لا سيما عندما يكون Ne صغيرًا وتكون نسبة معدل الطفرة/إعادة التركيب منخفضة، إلى الحد الذي يتم فيه استنتاج الخرائط من المجموعات السكانية التي تتقاسم نفس مشهد إعادة التركيب تظهر غير مترابطة. وبالتالي، فإننا نوجه رسالة تحذير لمستخدمي هذه الأساليب، على الأقل بالنسبة للجينومات ذات المناظر الطبيعية المعقدة لإعادة التركيب كما هو الحال في البشر.

معدل إعادة التركيب على نطاق السكان، LDhelmet، عمليات المحاكاة، اختلال الارتباط، المناظر الطبيعية لإعادة التركيب، النقاط الساخنة لإعادة التركيب

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Rendimiento y limitaciones de los métodos basados en desequilibrio de ligamiento para inferir el panorama genómico de recombinación y detectar puntos críticos: un estudio de simulación

El conocimiento de la variación de la tasa de recombinación a lo largo del genoma proporciona información importante sobre el genoma y la evolución fenotípica. Los enfoques genómicos poblacionales ofrecen una forma atractiva de inferir la tasa de recombinación a escala poblacional ⍴ = 4Ner utilizando la información del desequilibrio de ligamiento contenida en los datos del polimorfismo de la secuencia de ADN. Estos métodos se han utilizado en una amplia gama de especies de plantas y animales para construir mapas de recombinación de todo el genoma. Sin embargo, la fiabilidad de estas inferencias sólo se ha evaluado bajo un conjunto restrictivo de condiciones. Aquí, evaluamos la capacidad de uno de los programas coalescentes más utilizados, LDhelmet, para inferir un paisaje genómico de recombinación con las características biológicas de un paisaje similar al humano, incluidos los puntos críticos. Utilizando simulaciones, evaluamos específicamente el impacto de los parámetros metodológicos (tamaño de la muestra, errores de fase, penalización de bloque) y evolutivos (tamaño efectivo de la población (Ne), historia demográfica, relación entre la tasa de mutación y recombinación) en la calidad del mapa inferido. Informamos correlaciones razonablemente buenas entre paisajes simulados e inferidos, pero señalamos limitaciones cuando se trata de detectar puntos críticos de recombinación. Los puntos críticos falsos positivos y falsos negativos confunden considerablemente los patrones de recombinación inferida a escala fina en una amplia gama de condiciones, particularmente cuando Ne es pequeño y la relación de tasa de mutación/recombinación es baja, hasta el punto de que los mapas se infieren a partir de poblaciones que comparten el mismo paisaje de recombinación. parecen no correlacionados. Por lo tanto, enviamos un mensaje de precaución para los usuarios de estos enfoques, al menos para genomas con paisajes de recombinación complejos, como en los humanos.

Tasa de recombinación a escala poblacional, LDhelmet, simulaciones, desequilibrio de vinculación, paisajes de recombinación, puntos críticos de recombinación

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Performances et limites des méthodes basées sur le déséquilibre de liaison pour déduire le paysage génomique de la recombinaison et détecter les points chauds : une étude par simulation

La connaissance de la variation du taux de recombinaison le long du génome fournit des informations importantes sur l'évolution du génome et du phénotype. Les approches génomiques de population offrent un moyen intéressant de déduire le taux de recombinaison à l'échelle de la population ⍴ = 4Ner en utilisant les informations sur le déséquilibre de liaison contenues dans les données de polymorphisme de séquence d'ADN. De telles méthodes ont été utilisées dans un large éventail d’espèces végétales et animales pour créer des cartes de recombinaison à l’échelle du génome. Cependant, la fiabilité de ces déductions n’a été évaluée que sous un ensemble de conditions restrictives. Ici, nous évaluons la capacité de l'un des programmes basés sur la coalescence les plus largement utilisés, LDhelmet, à déduire un paysage génomique de recombinaison avec les caractéristiques biologiques d'un paysage de type humain comprenant des points chauds. À l’aide de simulations, nous avons spécifiquement évalué l’impact des paramètres méthodologiques (taille de l’échantillon, erreurs de phase, pénalité de bloc) et évolutifs (taille effective de la population (Ne), historique démographique, rapport mutation/taux de recombinaison) sur la qualité de la carte déduite. Nous rapportons des corrélations raisonnablement bonnes entre les paysages simulés et déduits, mais soulignons des limites en ce qui concerne la détection des points chauds de recombinaison. Les points chauds faussement positifs et faussement négatifs confondent considérablement les modèles à petite échelle de recombinaison déduite dans un large éventail de conditions, en particulier lorsque Ne est petit et que le rapport mutation/taux de recombinaison est faible, dans la mesure où les cartes déduites de populations partageant le même paysage de recombinaison semblent non corrélés. Nous adressons donc un message de prudence aux utilisateurs de ces approches, au moins pour les génomes présentant des paysages de recombinaison complexes comme chez l'homme.

Taux de recombinaison à l'échelle de la population, LDhelmet, simulations, déséquilibre de liaison, paysages de recombinaison, points chauds de recombinaison

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

पुनर्संयोजन के जीनोमिक परिदृश्य का अनुमान लगाने और हॉटस्पॉट का पता लगाने के लिए लिंकेज-असंतुलन-आधारित तरीकों का प्रदर्शन और सीमाएं: एक सिमुलेशन अध्ययन

जीनोम के साथ पुनर्संयोजन दर भिन्नता का ज्ञान जीनोम और फेनोटाइपिक विकास में महत्वपूर्ण अंतर्दृष्टि प्रदान करता है। जनसंख्या जीनोमिक दृष्टिकोण डीएनए अनुक्रम बहुरूपता डेटा में निहित लिंकेज असंतुलन जानकारी का उपयोग करके जनसंख्या-स्केल्ड पुनर्संयोजन दर ⍴=4Ner का अनुमान लगाने का एक आकर्षक तरीका प्रदान करते हैं। जीनोम-व्यापी पुनर्संयोजन मानचित्र बनाने के लिए पौधों और जानवरों की प्रजातियों की एक विस्तृत श्रृंखला में इस तरह के तरीकों का उपयोग किया गया है। हालाँकि, इन अनुमानों की विश्वसनीयता का मूल्यांकन केवल शर्तों के एक प्रतिबंधात्मक सेट के तहत किया गया है। यहां, हम हॉटस्पॉट सहित मानव-जैसे परिदृश्य की जैविक विशेषताओं के साथ पुनर्संयोजन के एक जीनोमिक परिदृश्य का अनुमान लगाने के लिए सबसे व्यापक रूप से उपयोग किए जाने वाले कोलेसेंट-आधारित कार्यक्रमों में से एक, एलहेल्मेट की क्षमता का मूल्यांकन करते हैं। सिमुलेशन का उपयोग करते हुए, हमने विशेष रूप से अनुमानित मानचित्र गुणवत्ता पर कार्यप्रणाली (नमूना आकार, चरणबद्ध त्रुटियां, ब्लॉक जुर्माना) और विकासवादी मापदंडों (प्रभावी जनसंख्या आकार (एनई), जनसांख्यिकीय इतिहास, उत्परिवर्तन से पुनर्संयोजन दर अनुपात) के प्रभाव का आकलन किया। हम सिम्युलेटेड और अनुमानित परिदृश्यों के बीच यथोचित अच्छे सहसंबंधों की रिपोर्ट करते हैं, लेकिन जब पुनर्संयोजन हॉटस्पॉट का पता लगाने की बात आती है तो सीमाओं की ओर इशारा करते हैं। गलत सकारात्मक और गलत नकारात्मक हॉटस्पॉट विभिन्न परिस्थितियों में अनुमानित पुनर्संयोजन के बारीक पैमाने के पैटर्न को काफी हद तक भ्रमित करते हैं, खासकर जब Ne छोटा होता है और उत्परिवर्तन/पुनर्संयोजन दर अनुपात कम होता है, इस हद तक कि समान पुनर्संयोजन परिदृश्य को साझा करने वाली आबादी से अनुमानित मानचित्र असंबंधित दिखाई देते हैं. इस प्रकार हम इन दृष्टिकोणों के उपयोगकर्ताओं के लिए सावधानी का संदेश देते हैं, कम से कम मनुष्यों जैसे जटिल पुनर्संयोजन परिदृश्य वाले जीनोम के लिए।

जनसंख्या-स्केल पुनर्संयोजन दर, एलहेल्मेट, सिमुलेशन, लिंकेज असंतुलन, पुनर्संयोजन परिदृश्य, पुनर्संयोजन हॉटस्पॉट

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

組換えのゲノム状況を推測し、ホットスポットを検出するための連鎖不平衡に基づく方法の性能と限界: シミュレーション研究

ゲノムに沿った組換え率の変動に関する知識は、ゲノムと表現型の進化に関する重要な洞察を提供します。集団ゲノムアプローチは、DNA 配列多型データに含まれる連鎖不平衡情報を使用して、集団規模の組換え率 ⍴=4Ner を推定する魅力的な方法を提供します。このような方法は、ゲノム全体の組換えマップを構築するために、広範囲の動植物種で使用されています。ただし、これらの推論の信頼性は、限られた一連の条件の下でのみ評価されています。ここでは、最も広く使用されている合体ベースのプログラムの 1 つである LDhelmet が、ホットスポットを含む人間に似たランドスケープの生物学的特徴と組み換えのゲノムランドスケープを推論する能力を評価します。シミュレーションを使用して、推定されたマップの品質に対する方法論（サンプルサイズ、フェージングエラー、ブロックペナルティ）と進化パラメータ（有効集団サイズ（Ne）、人口統計履歴、突然変異と組換え率の比）の影響を具体的に評価しました。私たちは、シミュレートされたランドスケープと推定されたランドスケープの間にかなり良好な相関関係があることを報告していますが、組換えホットスポットの検出に関しては限界があることを指摘しています。偽陽性および偽陰性のホットスポットは、広範囲の条件下、特に Ne が小さく、突然変異/組換え率比が低い場合、同じ組換え状況を共有する集団から推定されるマップに至るまで、推定される組換えの微細なパターンをかなり混乱させます。無相関に見える。したがって、少なくともヒトのような複雑な組換え状況を伴うゲノムについては、これらのアプローチのユーザーに対する注意喚起のメッセージを取り上げます。

集団スケールの組換え率、LDヘルメット、シミュレーション、連鎖不均衡、組換えランドスケープ、組換えホットスポット

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Desempenho e limitações de métodos baseados em desequilíbrio de ligação para inferir o cenário genômico de recombinação e detectar hotspots: um estudo de simulação

O conhecimento da variação da taxa de recombinação ao longo do genoma fornece informações importantes sobre o genoma e a evolução fenotípica. As abordagens genômicas populacionais oferecem uma maneira atraente de inferir a taxa de recombinação em escala populacional ⍴ = 4Ner usando as informações de desequilíbrio de ligação contidas nos dados de polimorfismo de sequência de DNA. Tais métodos têm sido usados em uma ampla gama de espécies de plantas e animais para construir mapas de recombinação genômica ampla. Contudo, a fiabilidade destas inferências só foi avaliada sob um conjunto restritivo de condições. Aqui, avaliamos a capacidade de um dos programas baseados em coalescentes mais amplamente utilizados, o LDhelmet, de inferir uma paisagem genômica de recombinação com as características biológicas de uma paisagem semelhante à humana, incluindo hotspots. Usando simulações, avaliamos especificamente o impacto de parâmetros metodológicos (tamanho da amostra, erros de fase, penalidade de bloco) e evolutivos (tamanho efetivo da população (Ne), histórico demográfico, taxa de mutação para recombinação) na qualidade inferida do mapa. Relatamos correlações razoavelmente boas entre paisagens simuladas e inferidas, mas apontamos para limitações quando se trata de detectar pontos críticos de recombinação. Pontos críticos falsos positivos e falsos negativos confundem consideravelmente os padrões de recombinação inferida em escala precisa sob uma ampla gama de condições, particularmente quando Ne é pequeno e a relação taxa de mutação/recombinação é baixa, na medida em que mapas inferidos de populações que compartilham a mesma paisagem de recombinação parecem não correlacionados. Assim, enviamos uma mensagem de cautela para os usuários dessas abordagens, pelo menos para genomas com paisagens de recombinação complexas, como em humanos.

Taxa de recombinação em escala populacional, LDhelmet, simulações, desequilíbrio de ligação, paisagens de recombinação, pontos de acesso de recombinação

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Эффективность и ограничения методов, основанных на неравновесии по сцеплению, для определения геномного ландшафта рекомбинации и обнаружения горячих точек: исследование моделирования

Знание об изменении скорости рекомбинации по всему геному дает важную информацию о геномной и фенотипической эволюции. Подходы к популяционной геномике предлагают привлекательный способ сделать вывод о скорости рекомбинации в масштабе популяции ⍴=4Ner, используя информацию о неравновесии по сцеплению, содержащуюся в данных о полиморфизме последовательностей ДНК. Такие методы использовались у широкого круга видов растений и животных для создания полногеномных карт рекомбинации. Однако надежность этих выводов оценивалась только при ограниченном наборе условий. Здесь мы оцениваем способность одной из наиболее широко используемых программ на основе коалесцентов, LDhelmet, делать выводы о геномном ландшафте рекомбинации с биологическими характеристиками человекоподобного ландшафта, включая горячие точки. Используя моделирование, мы специально оценили влияние методологических (размер выборки, ошибки фазировки, штраф за блок) и эволюционных параметров (эффективный размер популяции (Ne), демографическая история, соотношение скорости мутаций и рекомбинации) на предполагаемое качество карты. Мы сообщаем о достаточно хороших корреляциях между смоделированными и предполагаемыми ландшафтами, но указываем на ограничения, когда дело доходит до обнаружения горячих точек рекомбинации. Ложноположительные и ложноотрицательные горячие точки значительно искажают мелкомасштабные закономерности предполагаемой рекомбинации в широком диапазоне условий, особенно когда Ne мало и соотношение скоростей мутаций/рекомбинации низкое, до такой степени, что карты, полученные на основе популяций, разделяющих один и тот же ландшафт рекомбинации. кажутся некоррелированными. Таким образом, мы обращаемся с предостережением к пользователям этих подходов, по крайней мере, для геномов со сложным ландшафтом рекомбинации, таких как у людей.

Скорость рекомбинации в масштабе популяции, LDhelmet, моделирование, неравновесие по сцеплению, ландшафты рекомбинации, горячие точки рекомбинации

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

基于连锁不平衡的方法推断重组基因组景观和检测热点的性能和局限性：模拟研究

关于基因组重组率变化的知识为基因组和表型进化提供了重要的见解。群体基因组方法提供了一种利用 DNA 序列多态性数据中包含的连锁不平衡信息来推断群体规模重组率 ⍴=4Ner 的有吸引力的方法。这些方法已广泛用于植物和动物物种，以构建全基因组重组图谱。然而，这些推论的可靠性仅在一组限制性条件下进行评估。在这里，我们评估了最广泛使用的基于聚结的程序之一 LDhelmet 的能力，以推断重组基因组景观与类人景观（包括热点）的生物学特征。通过模拟，我们专门评估了方法论（样本量、定相误差、区块惩罚）和进化参数（有效群体规模（Ne）、人口统计历史、突变与重组率之比）对推断图谱质量的影响。我们报告了模拟景观和推断景观之间相当好的相关性，但指出了在检测重组热点方面的局限性。假阳性和假阴性热点在各种条件下极大地混淆了推断重组的精细尺度模式，特别是当 Ne 很小时且突变/重组率比率较低时，以至于从共享相同重组景观的群体推断出的图谱显得不相关。因此，我们向这些方法的使用者发出警告，至少对于具有复杂重组景观的基因组（例如人类）。

群体规模重组率、LDhelmet、模拟、连锁不平衡、重组景观、重组热点

Submission: posted 05 April 2022
Recommendation: posted 23 February 2023, validated 24 February 2023

Cite this recommendation as:
Ramos-Onsins, S. (2023) How to interpret the inference of recombination landscapes on methods based on linkage disequilibrium?. Peer Community in Genomics, 100161. https://doi.org/10.24072/pci.genomics.100161

Recommendation

Data interpretation depends on previously established and validated tools, designed for a specific type of data. These methods, however, are usually based on simple models with validity subject to a set of theoretical parameterized conditions and data types. Accordingly, the tool developers provide the potential users with guidelines for data interpretations within the tools’ limitation. Nevertheless, once the methodology is accepted by the community, it is employed in a large variety of empirical studies outside of the method’s original scope or that typically depart from the standard models used for its design, thus potentially leading to the wrong interpretation of the results.

Numerous empirical studies inferred recombination rates across genomes, detecting hotspots of recombination and comparing related species (e.g., Shanfelter et al. 2019, Spence and Song 2019). These studies used indirect methodologies based on the signals that recombination left in the genome, such as linkage disequilibrium and the patterns of haplotype segregation (e.g.,Chan et al. 2012). The conclusions from these analyses have been used, for example, to interpret the evolution of the chromosomal structure or the evolution of recombination among closely related species.

Indirect methods have the advantage of collecting a large quantity of recombination events, and thus have a better resolution than direct methods (which only detect the few recombination events occurring at that time). On the other hand, indirect methods are affected by many different evolutionary events, such as demographic changes and selection. Indeed, the inference of recombination levels across the genome has not been studied accurately in non-standard conditions. Linkage disequilibrium is affected by several factors that can modify the recombination inference, such as demographic history, events of selection, population size, and mutation rate, but is also related to the size of the studied sample, and other technical parameters defined for each specific methodology.

Raynaud et al (2023) analyzed the reliability of the recombination rate inference when considering the violation of several standard assumptions (evolutionary and methodological) in one of the most popular families of methods based on LDhat (McVean et al. 2004), specifically its improved version, LDhelmet (Chan et al. 2012). These methods cover around 70 % of the studies that infer recombination rates. The authors used recombination maps, obtained from empirical studies on humans, and included hotspots, to perform a detailed simulation study of the capacity of this methodology to correctly infer the pattern of recombination and the location of these hotspots. Correlations between the real, and inferred values from simulations were obtained, as well as several rates, such as the true positive and false discovery rate to detect hotspots.

The authors of this work send a message of caution to researchers that are applying this methodology to interpret data from the inference of recombination landscapes and the location of hotspots. The inference of recombination landscapes and hotspots can differ considerably even in standard model conditions. In addition, demographic processes, like bottleneck or admixture, but also the level of population size and mutation rates, can substantially affect the estimation accuracy of the level of recombination and the location of hotspots. Indeed, the inference of the location of hotspots in simulated data with the same landscape, can be very imprecise when standard assumptions are violated or not considered. These effects may lead to incorrect interpretations, for example about the conservation of recombination maps between closely related species. Finally, Raynaud et al (2023) included a useful guide with advice on how to obtain accurate recombination estimations with methods based on linkage disequilibrium, also emphasizing the limitations of such approaches.

REFERENCES

Chan AH, Jenkins PA, Song YS (2012) Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster. PLOS Genetics, 8, e1003090. https://doi.org/10.1371/journal.pgen.1003090

McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The Fine-Scale Structure of Recombination Rate Variation in the Human Genome. Science, 304, 581–584. https://doi.org/10.1126/science.1092500

Raynaud M, Gagnaire P-A, Galtier N (2023) Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study. bioRxiv, 2022.03.30.486352, ver. 2 peer-reviewed and recommended by Peer Community in Genomics. https://doi.org/10.1101/2022.03.30.486352

Spence JP, Song YS (2019) Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Science Advances, 5, eaaw9206. https://doi.org/10.1126/sciadv.aaw9206

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
This project was funded by the ANR HotRec ANR-19-CE12-0019

Reviews

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.1101/2022.03.30.486352

Version of the preprint: 1

Author's Reply, 20 Jan 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.genomics.100161.ar1

Decision by Sebastian Ernesto Ramos-Onsins, posted 23 Jun 2022

Dear Nicolas Galtier,

The manuscript titled "Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study" have been reviewed.

The two reviewers find this manuscript interesting and well written. The results obtained from the simulation study provide valuable information for the interpretation and understanding of the empirical data. The reviewers have provided various comments and suggestions that will help improve the manuscript. Specifically, they are both interested in understanding the impact of additional evolutionary parameters, such as the effect of demography and positive and negative selection.

Other minor corrections should be considered. Some figures are color coded although the color is not visible on the graph (Figure 3, Figure S3), Other Figures have unclear comparisons (Figure 5, the actual rate is hardly visible in blue) and some others may include labels additions for a quicker understanding of the multiple axis. Improve the presentation of the figure in its revised version.
I found some typos or unclear sentences (line 274, should the sensitivity be TPR?, line 275, FDR is a way to measure type I error, which is based on alternative hypotheses, although type I error is usually defined as FPR).

Consider and respond to all comments and suggestions from reviewers before submitting the new version.

Sincerely,

Sebastian E. Ramos-Onsins

https://doi.org/10.24072/pci.genomics.100161.d1

Reviewed by anonymous reviewer 1, 25 May 2022

In this article, Raynaud et al. evaluate the ability of a population genomic and linkage-disequilibrium based approach LDhelmet for inferring biologically-realistic landscapes of recombination in humans. In particular, using extensive simulations, they evaluate the accuracy of detecting recombination hotspots and discuss implications for prior studies that have used such approaches to address questions such as whether recombination hotspots are evolutionarily conserved between closely related species.

They conclude that while LD-based approaches can provide high quality recombination maps in species with simple recombination landscapes, for species with complex recombinational landscapes their usage for biological interpretations regarding the evolutionary conservation of recombination maps warrants more caution. The biases and uncertainty in inferred recombination rates as well as the potential for false positives and false negatives in hotspot detection needs to be taken into account before making these interpretations. Furthermore, they note that simulation scenarios in this paper are optimistic in terms of the reliabiity of hotspot detection and in empirical data there are further sources of noise including sequencing, mapping and phasing errors as well population demography.

Comments:

1. Based on empirical data and error rates, would it be possible to include some plausible scenarios of sequencing error? For accounting for uncertainty of phasing, can genotype data be simulated and subsequenly phased before conducting analysis.

2. There could additional confounding factors when inferring fine-scale recombination patterns from LD, in particular the confounding between cross-overs and gene-conversions. Has any analyses been done in this regard?

3. Human populations can have complex demography with migration between populations. Can this further impact the accuracy of such inference methods or do we expect relative rates to be robust?

4. Presence of natural selection in a region can bias recombination inference. How may this affect hotspot inference? In general, other than simple recombination landscapes, what other assumptions need to be met for hotspot inference to be accurate?

5. Structural variation such as inversions can also impact recombination inference in certain species like drosophilia. This may further contribute to uncertainty.

https://doi.org/10.24072/pci.genomics.100161.rev11

Reviewed by anonymous reviewer 2, 08 Jun 2022

Raynaud et al present a manuscript using simulations to test the performance and limitations of a commonly used method to infer recombination landscapes, LDhelmet. They find that maps produced with the method have good correlations with the true simulated maps, but there are limitations when the method is applied to detect hotspots. In particular, LDhelmet tends to overestimate the local recombination rate. Additionally, they note that the method can find shared hotspots when there are no real shared hotspots. This result has implications, for example, for interpreting data from Shanfelter et al 2019, who used LDhelmet and found little overlap between marine and freshwater populations of three-spine stickleback. In general I found the manuscript interesting and well-written, but I have some suggestions which I hope will improve the manuscript.

Major comments:
· I found the scope of the study to be more limited that I expected. The authors focus on a single method published in 2012. While this method is widely used, there have been several additional methods published in the following years (which the authors cite in the introduction, line 62-63). I would be interested in understanding how other recently developed methods compare.
· Additional evolutionary parameters: in a study like this, one has to make choices about which parameters to study. I agree with the authors that studying the impacts of effective population size, mutation rates, and recombination rates is important. However, I will suggest 2 additional factors that I think would be substantial benefits to the manuscript:
a. Selection: the impacts of both positive and negative selection on patterns of LD are well known. However, I wonder how these forces affect hotspot inference. The authors could implement simple simulations in SLiM (Haller et al 2019 MBE), which will output a VCF and should fit fairly smoothly into the pipeline the authors have already set up.
b. Demographic changes with large effects on LD: it is well known that bottlenecks and exponential growth will affect LD patterns. Given the results presented in the paper, I would expect that these would also affect inferences of recombination hotspots, but I would be interested in quantifying how much.

Minor comments:
· Could the authors give some intuition about what the block penalty does?
· Line 338-339: 10-9 should be 10^-9

https://doi.org/10.24072/pci.genomics.100161.rev12