Recommendation

Assessing a novel sequencing-based approach for population genomics in non-model species

Thomas Derrien and Sebastian Ernesto Ramos-Onsins based on reviews by Valentin Wucher and 1 anonymous reviewer

A recommendation of:

An evaluation of pool-sequencing transcriptome-based exon capture for population genomics in non-model species

Emeline Deleury, Thomas Guillemaud, Aurélie Blin & Eric Lombaert (2020), bioRxiv, 583534, ver. 7 peer-reviewed and recommended by Peer Community in Genomics https://doi.org/10.1101/583534

Read preprint in preprint server

Data used for results

Scripts used to obtain or analyze results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

An evaluation of pool-sequencing transcriptome-based exon capture for population genomics in non-model species

Exon capture coupled to high-throughput sequencing constitutes a cost-effective technical solution for addressing specific questions in evolutionary biology by focusing on expressed regions of the genome preferentially targeted by selection. Transcriptome-based capture, a process that can be used to capture the exons of non-model species, is used in phylogenomics. However, its use in population genomics remains rare due to the high costs of sequencing large numbers of indexed individuals across multiple populations. We evaluated the feasibility of combining transcriptome-based capture and the pooling of tissues from numerous individuals for DNA extraction as a cost-effective, generic and robust approach to estimating the variant allele frequencies of any species at the population level. We designed capture probes for ~5 Mb of chosen de novo transcripts from the Asian ladybird Harmonia axyridis (5,717 transcripts). We called ~300,000 bi-allelic SNPs for a pool of 36 non-indexed individuals. Capture efficiency was high, and pool-seq was as effective and accurate as individual-seq for detecting variants and estimating allele frequencies. Finally, we also evaluated an approach for simplifying bioinformatic analyses by mapping genomic reads directly to targeted transcript sequences to obtain coding variants. This approach is effective and does not affect the estimation of SNP allele frequencies, except for a small bias close to some exon ends. We demonstrate that this approach can also be used to predict the intron-exon boundaries of targeted de novo transcripts, making it possible to abolish genotyping biases near exon ends.

Target enrichment; Non-model organism; Population genomics; Pool-sequencing; Harmonia axyridis; Intron-exon boundary prediction

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

تقييم لالتقاط الإكسون القائم على النسخ المتسلسل لعلم الجينوم السكاني في الأنواع غير النموذجية

يشكل التقاط Exon المقترن بالتسلسل عالي الإنتاجية حلاً تقنيًا فعالاً من حيث التكلفة لمعالجة أسئلة محددة في علم الأحياء التطوري من خلال التركيز على المناطق المعبر عنها من الجينوم والتي يتم استهدافها بشكل تفضيلي عن طريق الاختيار. يتم استخدام الالتقاط القائم على النسخ، وهي عملية يمكن استخدامها لالتقاط إكسونات الأنواع غير النموذجية، في علم تطور السلالات. ومع ذلك، فإن استخدامه في علم الجينوم السكاني يظل نادرًا بسبب ارتفاع تكاليف تسلسل أعداد كبيرة من الأفراد المفهرسين عبر مجموعات سكانية متعددة. قمنا بتقييم جدوى الجمع بين الالتقاط القائم على النسخ وتجميع الأنسجة من العديد من الأفراد لاستخراج الحمض النووي باعتباره نهجًا فعالاً من حيث التكلفة وعامًا وقويًا لتقدير ترددات الأليل المتغيرة لأي نوع على مستوى السكان. لقد قمنا بتصميم تحقيقات الالتقاط لحوالي 5 ميغابايت من نصوص دي نوفو المختارة من الدعسوقة الآسيوية Harmonia axyridis (5717 نسخة). لقد أطلقنا على ما يقرب من 300000 من تعدد الأشكال (SNP) ثنائي الأليلية لمجموعة مكونة من 36 فردًا غير مفهرسة. كانت كفاءة الالتقاط عالية، وكان تسلسل التجمع فعالاً ودقيقًا مثل التسلسل الفردي لاكتشاف المتغيرات وتقدير ترددات الأليل. أخيرًا، قمنا أيضًا بتقييم نهج لتبسيط تحليلات المعلومات الحيوية عن طريق تعيين القراءات الجينومية مباشرةً إلى تسلسلات النسخ المستهدفة للحصول على متغيرات الترميز. هذا النهج فعال ولا يؤثر على تقدير ترددات أليل SNP، باستثناء وجود تحيز صغير قريب من بعض نهايات الإكسون. لقد أثبتنا أن هذا النهج يمكن استخدامه أيضًا للتنبؤ بحدود الإنترون-إكسون للنصوص المستهدفة، مما يجعل من الممكن إلغاء تحيزات التنميط الجيني بالقرب من نهايات إكسون.

تخصيب الهدف؛ كائن غير نموذجي؛ علم الجينوم السكاني؛ تسلسل التجمع؛ هارمونيا أكيريديس. التنبؤ بحدود إنترون إكسون

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Una evaluación de la captura de exones basada en transcriptomas de secuenciación colectiva para la genómica de poblaciones en especies que no son modelo

La captura de exones junto con la secuenciación de alto rendimiento constituye una solución técnica rentable para abordar cuestiones específicas en biología evolutiva centrándose en regiones expresadas del genoma a las que se dirige preferentemente la selección. La captura basada en transcriptomas, un proceso que se puede utilizar para capturar los exones de especies no modelo, se utiliza en filogenómica. Sin embargo, su uso en genómica de poblaciones sigue siendo poco común debido a los altos costos de secuenciar un gran número de individuos indexados en múltiples poblaciones. Evaluamos la viabilidad de combinar la captura basada en transcriptomas y la combinación de tejidos de numerosos individuos para la extracción de ADN como un enfoque rentable, genérico y sólido para estimar las frecuencias de alelos variantes de cualquier especie a nivel de población. Diseñamos sondas de captura para ~ 5 Mb de transcripciones de novo seleccionadas de la mariquita asiática Harmonia axyridis (5717 transcripciones). Llamamos a ~300.000 SNP bialélicos para un grupo de 36 individuos no indexados. La eficiencia de captura fue alta y pool-seq fue tan efectivo y preciso como individual-seq para detectar variantes y estimar frecuencias de alelos. Finalmente, también evaluamos un enfoque para simplificar los análisis bioinformáticos mediante el mapeo de lecturas genómicas directamente a secuencias de transcripción específicas para obtener variantes de codificación. Este enfoque es eficaz y no afecta la estimación de las frecuencias alélicas de SNP, excepto por un pequeño sesgo cerca de algunos extremos de exón. Demostramos que este enfoque también se puede utilizar para predecir los límites intrón-exón de transcripciones de novo específicas, lo que permite eliminar los sesgos de genotipado cerca de los extremos del exón.

Enriquecimiento de objetivos; Organismo no modelo; Genómica de poblaciones; Secuenciación de grupos; Harmonia axyridis; Predicción del límite intrón-exón

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Une évaluation de la capture d'exons basée sur le transcriptome par séquençage en pool pour la génomique des populations chez les espèces non modèles

La capture d'exons couplée au séquençage à haut débit constitue une solution technique rentable pour répondre à des questions spécifiques en biologie évolutive en se concentrant sur les régions exprimées du génome préférentiellement ciblées par la sélection. La capture basée sur le transcriptome, un processus qui peut être utilisé pour capturer les exons d'espèces non modèles, est utilisée en phylogénomique. Cependant, son utilisation en génomique des populations reste rare en raison des coûts élevés liés au séquençage d’un grand nombre d’individus indexés dans plusieurs populations. Nous avons évalué la faisabilité de combiner la capture basée sur le transcriptome et le regroupement de tissus de nombreux individus pour l'extraction de l'ADN en tant qu'approche rentable, générique et robuste pour estimer les fréquences d'allèles variantes de n'importe quelle espèce au niveau de la population. Nous avons conçu des sondes de capture pour environ 5 Mo de transcriptions de novo choisies de la coccinelle asiatique Harmonia axyridis (5 717 transcriptions). Nous avons appelé environ 300 000 SNP bi-alléliques pour un pool de 36 individus non indexés. L'efficacité de la capture était élevée et le pool-seq était aussi efficace et précis que le séquençage individuel pour détecter les variantes et estimer les fréquences alléliques. Enfin, nous avons également évalué une approche permettant de simplifier les analyses bioinformatiques en mappant les lectures génomiques directement sur des séquences de transcription ciblées pour obtenir des variantes codantes. Cette approche est efficace et n’affecte pas l’estimation des fréquences des allèles SNP, à l’exception d’un léger biais proche de certaines extrémités d’exons. Nous démontrons que cette approche peut également être utilisée pour prédire les limites intron-exon des transcrits de novo ciblés, permettant ainsi d'abolir les biais de génotypage près des extrémités des exons.

Enrichissement de la cible ; Organisme non modèle ; Génomique des populations ; Séquençage en pool ; Harmonia axyridis; Prédiction de la limite intron-exon

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

गैर-मॉडल प्रजातियों में जनसंख्या जीनोमिक्स के लिए पूल-अनुक्रमण ट्रांस्क्रिप्टोम-आधारित एक्सॉन कैप्चर का मूल्यांकन

उच्च-थ्रूपुट अनुक्रमण के साथ मिलकर एक्सॉन कैप्चर, चयन द्वारा प्राथमिकता से लक्षित जीनोम के व्यक्त क्षेत्रों पर ध्यान केंद्रित करके विकासवादी जीवविज्ञान में विशिष्ट प्रश्नों को संबोधित करने के लिए एक लागत प्रभावी तकनीकी समाधान का गठन करता है। ट्रांस्क्रिप्टोम-आधारित कैप्चर, एक प्रक्रिया जिसका उपयोग गैर-मॉडल प्रजातियों के एक्सॉन को पकड़ने के लिए किया जा सकता है, का उपयोग फ़ाइलोजेनोमिक्स में किया जाता है। हालाँकि, कई आबादी में बड़ी संख्या में अनुक्रमित व्यक्तियों को अनुक्रमित करने की उच्च लागत के कारण जनसंख्या जीनोमिक्स में इसका उपयोग दुर्लभ है। हमने जनसंख्या स्तर पर किसी भी प्रजाति की भिन्न एलील आवृत्तियों का अनुमान लगाने के लिए एक लागत प्रभावी, सामान्य और मजबूत दृष्टिकोण के रूप में डीएनए निष्कर्षण के लिए ट्रांसक्रिपटोम-आधारित कैप्चर और कई व्यक्तियों से ऊतकों की पूलिंग के संयोजन की व्यवहार्यता का मूल्यांकन किया। हमने एशियन लेडीबर्ड हरमोनिया एक्सिरिडिस (5,717 ट्रांसक्रिप्ट) से चुने गए डे नोवो ट्रांसक्रिप्ट के ~5 एमबी के लिए कैप्चर जांच डिज़ाइन की है। हमने 36 गैर-अनुक्रमित व्यक्तियों के एक पूल के लिए ~300,000 द्वि-एलील एसएनपी को बुलाया। कैप्चर दक्षता अधिक थी, और वेरिएंट का पता लगाने और एलील आवृत्तियों का अनुमान लगाने के लिए पूल-सीक व्यक्तिगत-सीक जितना ही प्रभावी और सटीक था। अंत में, हमने कोडिंग वेरिएंट प्राप्त करने के लिए जीनोमिक रीड्स को सीधे लक्षित ट्रांसक्रिप्ट अनुक्रमों में मैप करके जैव सूचनात्मक विश्लेषण को सरल बनाने के लिए एक दृष्टिकोण का मूल्यांकन किया। यह दृष्टिकोण प्रभावी है और कुछ एक्सॉन सिरों के करीब एक छोटे पूर्वाग्रह को छोड़कर, एसएनपी एलील आवृत्तियों के अनुमान को प्रभावित नहीं करता है। हम प्रदर्शित करते हैं कि इस दृष्टिकोण का उपयोग लक्षित डे नोवो ट्रांस्क्रिप्ट की इंट्रॉन-एक्सॉन सीमाओं की भविष्यवाणी करने के लिए भी किया जा सकता है, जिससे एक्सॉन सिरों के पास जीनोटाइपिंग पूर्वाग्रहों को खत्म करना संभव हो जाता है।

लक्ष्य संवर्धन; गैर-मॉडल जीव; जनसंख्या जीनोमिक्स; पूल-अनुक्रमण; हरमोनिया एक्सिरिडिस; इंट्रॉन-एक्सॉन सीमा भविष्यवाणी

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

非モデル種における集団ゲノム解析のためのプールシークエンシングのトランスクリプトームベースのエクソン捕捉の評価

ハイスループットシークエンシングと組み合わせたエクソンキャプチャは、選択の対象となるゲノムの発現領域に重点を置くことで、進化生物学における特定の問題に対処するための費用対効果の高い技術ソリューションを構成します。トランスクリプトームベースのキャプチャは、モデル以外の種のエクソンをキャプチャするために使用できるプロセスであり、系統ゲノミクスで使用されます。ただし、複数の集団にわたるインデックス付きの多数の個人の配列を解析するコストが高いため、集団ゲノミクスでの使用は依然としてまれです。私たちは、集団レベルであらゆる種の変異対立遺伝子頻度を推定するための、費用対効果が高く、一般的かつ堅牢なアプローチとして、トランスクリプトームベースの捕捉と、DNA 抽出のための多数の個体からの組織のプールを組み合わせる実現可能性を評価しました。我々は、アジアテントウムシ Harmonia axyridis から選ばれた約 5 Mb の de novo 転写物 (5,717 個の転写物) の捕捉プローブを設計しました。私たちは、インデックス付けされていない 36 人の個人のプールに対して、約 300,000 個の両対立遺伝子 SNP を呼び出しました。捕捉効率は高く、プール配列は変異の検出と対立遺伝子頻度の推定において個別配列と同じくらい効果的かつ正確でした。最後に、ゲノムリードを標的転写配列に直接マッピングしてコーディングバリアントを取得することにより、バイオインフォマティクス分析を簡素化するアプローチも評価しました。このアプローチは効果的であり、一部のエクソン末端に近い小さな偏りを除いて、SNP 対立遺伝子頻度の推定に影響を与えません。私たちは、このアプローチを標的とする de novo 転写産物のイントロンとエクソンの境界を予測するのにも使用でき、エクソン末端付近の遺伝子型決定の偏りを排除できることを実証します。

ターゲットの強化。非モデル生物。集団ゲノミクス;プールの順序付け。ハルモニア・アキリディス。イントロンとエクソンの境界予測

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Uma avaliação da captura de exon baseada em transcriptoma de sequenciamento de pool para genômica populacional em espécies não modelo

A captura de éxons acoplada ao sequenciamento de alto rendimento constitui uma solução técnica econômica para abordar questões específicas da biologia evolutiva, concentrando-se em regiões expressas do genoma preferencialmente alvo de seleção. A captura baseada em transcriptoma, um processo que pode ser usado para capturar os exons de espécies não modelo, é usada em filogenômica. No entanto, a sua utilização na genómica populacional permanece rara devido aos elevados custos de sequenciação de um grande número de indivíduos indexados em múltiplas populações. Avaliamos a viabilidade de combinar a captura baseada em transcriptoma e o agrupamento de tecidos de numerosos indivíduos para extração de DNA como uma abordagem econômica, genérica e robusta para estimar as frequências alélicas variantes de qualquer espécie em nível populacional. Projetamos sondas de captura para ~ 5 Mb de transcrições de novo escolhidas da joaninha asiática Harmonia axyridis (5.717 transcrições). Chamamos aproximadamente 300.000 SNPs bialélicos para um conjunto de 36 indivíduos não indexados. A eficiência de captura foi alta e o pool-seq foi tão eficaz e preciso quanto o individual-seq para detectar variantes e estimar frequências alélicas. Finalmente, também avaliamos uma abordagem para simplificar análises bioinformáticas mapeando leituras genômicas diretamente para sequências transcritas direcionadas para obter variantes de codificação. Esta abordagem é eficaz e não afeta a estimativa das frequências alélicas do SNP, exceto por um pequeno viés próximo a algumas extremidades do éxon. Demonstramos que esta abordagem também pode ser usada para prever os limites íntron-éxon de transcrições de novo direcionadas, tornando possível abolir vieses de genotipagem perto das extremidades do éxon.

Enriquecimento alvo; Organismo não modelo; Genômica populacional; Sequenciamento de pool; Harmonia axyridis; Predição do limite íntron-éxon

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Оценка захвата экзонов на основе транскриптома при секвенировании пула для популяционной геномики немодельных видов

Захват экзонов в сочетании с высокопроизводительным секвенированием представляет собой экономически эффективное техническое решение для решения конкретных вопросов эволюционной биологии путем сосредоточения внимания на выраженных участках генома, на которые преимущественно нацелен отбор. Захват на основе транскриптома - процесс, который можно использовать для захвата экзонов немодельных видов, - используется в филогомике. Однако его использование в популяционной геномике остается редким из-за высоких затрат на секвенирование большого количества индексированных лиц в нескольких популяциях. Мы оценили возможность сочетания захвата на основе транскриптома и объединения тканей многочисленных особей для извлечения ДНК как экономически эффективного, универсального и надежного подхода к оценке частот вариантов аллелей любого вида на уровне популяции. Мы разработали зонды захвата для ~5 Мб выбранных de novo транскриптов азиатской божьей коровки Harmonia axyridis (5717 транскриптов). Мы вызвали около 300 000 биаллельных SNP для пула из 36 неиндексированных особей. Эффективность захвата была высокой, а пуловое секвенирование было таким же эффективным и точным, как и индивидуальное секвенирование, для обнаружения вариантов и оценки частот аллелей. Наконец, мы также оценили подход к упрощению биоинформатического анализа путем сопоставления геномных считываний непосредственно с целевыми последовательностями транскриптов для получения кодирующих вариантов. Этот подход эффективен и не влияет на оценку частот аллелей SNP, за исключением небольшого смещения вблизи концов некоторых экзонов. Мы демонстрируем, что этот подход также можно использовать для прогнозирования интрон-экзонных границ целевых транскриптов de novo, что позволяет устранить ошибки генотипирования вблизи концов экзонов.

Целевое обогащение; Немодельный организм; Популяционная геномика; пул-секвенирование; Гармония axyridis; Предсказание границы интрон-экзон

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

对非模型物种群体基因组中基于转录组测序的外显子捕获的评估

外显子捕获与高通量测序相结合，构成了一种经济有效的技术解决方案，通过关注选择优先靶向的基因组表达区域来解决进化生物学中的特定问题。基于转录组的捕获是一种可用于捕获非模型物种外显子的过程，用于系统发育学。然而，由于对多个群体中大量索引个体进行测序的成本高昂，其在群体基因组学中的应用仍然很少。我们评估了将基于转录组的捕获与汇集众多个体的组织进行 DNA 提取相结合的可行性，作为一种经济高效、通用且稳健的方法来估计种群水平上任何物种的变异等位基因频率。我们为从亚洲瓢虫 Harmonia axyridis 中选择的约 5 Mb 的 de novo 转录本（5,717 个转录本）设计了捕获探针。我们为 36 个非索引个体调用了约 300,000 个双等位基因 SNP。捕获效率很高，并且 pool-seq 在检测变异和估计等位基因频率方面与 individual-seq 一样有效和准确。最后，我们还评估了一种通过将基因组读数直接映射到目标转录序列以获得编码变体来简化生物信息学分析的方法。这种方法是有效的，除了靠近某些外显子末端的小偏差外，不影响 SNP 等位基因频率的估计。我们证明，这种方法还可用于预测目标从头转录本的内含子-外显子边界，从而可以消除外显子末端附近的基因分型偏差。

目标丰富；非模式生物；群体基因组学；池测序；异色瓢虫;内含子-外显子边界预测

Submission: posted 26 February 2020
Recommendation: posted 08 October 2020, validated 09 October 2020

Cite this recommendation as:
Derrien, T. and Ramos-Onsins, S. (2020) Assessing a novel sequencing-based approach for population genomics in non-model species. Peer Community in Genomics, 100002. https://doi.org/10.24072/pci.genomics.100002

Recommendation

Developing new sequencing and bioinformatic strategies for non-model species is of great interest in many applications, such as phylogenetic studies of diverse related species, but also for studies in population genomics, where a relatively large number of individuals is necessary. Different approaches have been developed and used in these last two decades, such as RAD-Seq (e.g., Miller et al. 2007), exome sequencing (e.g., Teer and Mullikin 2010) and other genome reduced representation methods that avoid the use of a good reference and well annotated genome (reviewed at Davey et al. 2011). However, population genomics studies require the analysis of numerous individuals, which makes the studies still expensive. Pooling samples was thought as an inexpensive strategy to obtain estimates of variability and other related to the frequency spectrum, thus allowing the study of variability at population level (e.g., Van Tassell et al. 2008), although the major drawback was the loss of information related to the linkage of the variants. In addition, population analysis using all these sequencing strategies require statistical and empirical validations that are not always fully performed. A number of studies aiming to obtain unbiased estimates of variability using reduced representation libraries and/or with pooled data have been performed (e.g., Futschik and Schlötterer 2010, Gautier et al. 2013, Ferretti et al. 2013, Lynch et al. 2014), as well as validation of new sequencing methods for population genetic analyses (e.g., Gautier et al. 2013, Nevado et al. 2014). Nevertheless, empirical validation using both pooled and individual experimental approaches combined with different bioinformatic methods has not been always performed.
Here, Deleury et al. (2020) proposed an efficient and elegant way of quantifying the single-nucleotide polymorphisms (SNPs) of exon-derived sequences in a non-model species (i.e. for which no reference genome sequence is available) at the population level scale. They also designed a new procedure to capture exon-derived sequences based on a reference transcriptome. In addition, they were able to make predictions of intron-exon boundaries for de novo transcripts based on the decay of read depth at the ends of the coding regions.
Based on theoretical predictions (Gautier et al. 2013), Deleury et al. (2020) designed a procedure to test the accuracy of variant allele frequencies (AFs) with pooled samples, in a reduced genome-sequence library made with transcriptome regions, and additionally testing the effects of new bioinformatic methods in contrast to standardized methods. They applied their strategy on the non-model species Asian ladybird (Harmonia axyridis), for which a draft genome is available, thereby allowing them to benchmark their method with regard to a traditional mapping-based approach. Based on species-specific de novo transcriptomes, they designed capture probes which are then used to call SNPx and then compared the resulting SNP AFs at the individual (multiplexed) versus population (pooled) levels. Interestingly, they showed that SNP AFs in the pool sequencing strategy nicely correlate with the individual ones but obviously in a cost-effective way. Studies of population genomics for non-model species have usually limited budgets. The number of individuals required for population genomics analysis multiply the costs of the project, making pooling samples an interesting option. Furthermore, the use of pool sequencing is not always a choice, as many organisms are too small and/or individuals are too sticked each other to be individually sequenced (e.g., Choquet et al. 2019, Kurland et al. 2019). In addition, the study of a reduced section of the genome is cheaper and often sufficient for a number of population genetic questions, such as the understanding of general demographic events, or the estimation of the effects of positive and/or negative selection at functional coding regions. Studies on population genomics of non-model species have many applications in related fields, such as conservation genetics, control of invasive species, etc. The work of Deleury et al. (2020) is an elegant contribution to the assessment and validation of new methodologies used for the analysis of genome variations at the intra-population variability level, highlighting straight bioinformatic and reliable sequencing methods for population genomics studies.

References

[1] Choquet et al. (2019). Towards population genomics in non-model species with large genomes: a case study of the marine zooplankton Calanus finmarchicus. Royal Society open science, 6(2), 180608. doi: https://doi.org/10.1098/rsos.180608
[2] Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M. and Blaxter, M. L. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12(7), 499-510. doi: https://doi.org/10.1038/nrg3012
[3] Deleury, E., Guillemaud, T., Blin, A. and Lombaert, E. (2020) An evaluation of pool-sequencing transcriptome-based exon capture for population genomics in non-model species. bioRxiv, 10.1101/583534, ver. 7 peer-reviewed and recommended by PCI Genomics. https://doi.org/10.1101/583534
[4] Ferretti, L., Ramos‐Onsins, S. E. and Pérez‐Enciso, M. (2013). Population genomics from pool sequencing. Molecular ecology, 22(22), 5561-5576. doi: https://doi.org/10.1111/mec.12522
[5] Futschik, A. and Schlötterer, C. (2010). Massively parallel sequencing of pooled DNA samples—the next generation of molecular markers. Genetics, 186 (1), 207-218. doi: https://doi.org/10.1534/genetics.110.114397
[6] Gautier et al. (2013). Estimation of population allele frequencies from next‐generation sequencing data: pool‐versus individual‐based genotyping. Molecular Ecology, 22(14), 3766-3779. doi: https://doi.org/10.1111/mec.12360
[7] Kurland et al. (2019). Exploring a Pool‐seq‐only approach for gaining population genomic insights in nonmodel species. Ecology and evolution, 9(19), 11448-11463. doi: https://doi.org/10.1002/ece3.5646
[8] Lynch, M., Bost, D., Wilson, S., Maruki, T. and Harrison, S. (2014). Population-genetic inference from pooled-sequencing data. Genome biology and evolution, 6(5), 1210-1218. doi: https://doi.org/10.1093/gbe/evu085
[9] Miller, M. R., Dunham, J. P., Amores, A., Cresko, W. A. and Johnson, E. A. (2007). Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome research, 17(2), 240-248. doi: https://doi.org/10.1101%2Fgr.5681207
[10] Nevado, B., Ramos‐Onsins, S. E. and Perez‐Enciso, M. (2014). Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Molecular ecology, 23(7), 1764-1779. doi: https://doi.org/10.1111/mec.12693
[11] Teer, J. K. and Mullikin, J. C. (2010). Exome sequencing: the sweet spot before whole genomes. Human molecular genetics, 19(R2), R145-R151. doi: https://doi.org/10.1093/hmg/ddq333
[12] Van Tassell et al. (2008). SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nature methods, 5(3), 247-252. doi: https://doi.org/10.1038/nmeth.1185

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Reviews

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.1101/583534

Author's Reply, 15 Sep 2020

Download author's reply https://doi.org/10.24072/pci.genomics.100010.ar1

Decision by Thomas Derrien, posted 07 Apr 2020

Dear Authors,

The two reviewers have now responded positively to your manuscript. Although they were impressed by the quantity of work that is described here, they have also made constructive comments and suggestions to clarify the manuscript.

The main points raised are the following:

1/ Illustration of the analysis workflow: Given the rather consequent analyses reported throughout the study (i.e. pool versus individual; SNP calling within exon versus at exon-intron junctions/borders (IEB); CDS mapping versus genome mapping…), it would be recommended to illustrate the workflow with a schema to guide the reader (e.g something like Transcriptome > CDS > probes > sequencing (pool vs individual) > SNP calling/mapping (CDS vs genome)). This would certainly add considerable value/directness to the described strategies and may also emphasize the contribution of the pooling strategy in the correct estimation of VAF as compared to indexed individuals. In addition, it could be interesting to define and use acronyms for the different methods for a better readability.

2/ Filtering: They are various filters used along both the method and result sections: CDS selection, SNPs calling, read coverage, CDS genome mapping. One could ask if (and how) they may influence/impact the effectiveness of the strategy. In the same lines, are the " ~5 Mb of randomly chosen" transcripts really random given that they were filtered based on their N-content, size, GC content?

Minor points: - Although this is not the main point of the study, would it possible to give more details about the de novo transcript annotation (initial numbers, method for reconstruction, sequenced tissues/stages…)? - line 443 : "the allele frequency estimates obtained with the two mapping methods were highly correlated both for the pool (r=0.998; Fig. 2C) and for the individuals (r=0.998)." It seems that the correlations of AF between the 2 mapping strategies (CDS vs genome) is slightly different for lower AF values (<0.2), with the mapping onto CDS slightly overestimating AF as compared to mapping onto genome (Fig 2C). Would it be interesting to do the correlations by bins/intervals of AFs?

One section of the discussion seems to have been duplicated.
The references are presented twice. Overall, the manuscript is well written and report a very interesting and cost effective strategy to estimate allele frequencies in non-model organisms at the population level, therefore we are looking forward to seeing a revised version.

Additional requirements of the managing board:

As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that:
-Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data.
-Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused.
-Details on experimental procedures are available to readers in the text or as appendices.
-Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.” All the best,

Thomas DERRIEN

https://doi.org/10.24072/pci.genomics.100010.d1

Reviewed by Valentin Wucher, 02 Apr 2020

Download the review https://doi.org/10.24072/pci.genomics.100010.rev11

Reviewed by anonymous reviewer 1, 31 Mar 2020

Download the review https://doi.org/10.24072/pci.genomics.100010.rev12

User comments

No user comments yet

or Register
Submit a preprint