Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: BioMed Central Country of Publication: England NLM ID: 100965258 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2164 (Electronic) Linking ISSN: 14712164 NLM ISO Abbreviation: BMC Genomics Subsets: MEDLINE
    • Publication Information:
      Original Publication: London : BioMed Central, [2000-
    • Subject Terms:
    • Abstract:
      Background: Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls.
      Results: To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions.
      Conclusions: Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
    • References:
      Bioinformatics. 2012 Sep 15;28(18):i333-i339. (PMID: 22962449)
      Genome Biol. 2018 Nov 6;19(1):188. (PMID: 30400818)
      Proc Natl Acad Sci U S A. 2010 Jan 19;107(3):1029-34. (PMID: 20018685)
      Nat Genet. 2018 Sep;50(9):1289-1295. (PMID: 30061735)
      Genome Res. 2019 Jul;29(7):1178-1187. (PMID: 31186302)
      Theor Appl Genet. 2014 Jan;127(1):1-18. (PMID: 23989647)
      Sci Transl Med. 2013 Apr 17;5(181):181re1. (PMID: 23596205)
      Nat Methods. 2018 Jun;15(6):461-468. (PMID: 29713083)
      Genome Biol. 2016 Jun 10;17(1):126. (PMID: 27287201)
      Nature. 2017 Jun 22;546(7659):524-527. (PMID: 28605751)
      Gigascience. 2017 Aug 1;6(8):1-9. (PMID: 28873962)
      Bioinformatics. 2012 Feb 1;28(3):423-5. (PMID: 22155870)
      Trends Plant Sci. 2019 Apr;24(4):352-365. (PMID: 30745056)
      Genome Biol. 2014 Jun 26;15(6):R84. (PMID: 24970577)
      Proc Natl Acad Sci U S A. 2013 Mar 26;110(13):5241-6. (PMID: 23479633)
      Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E4052-60. (PMID: 27354520)
      Bioinformatics. 2016 Apr 15;32(8):1220-2. (PMID: 26647377)
      Plant J. 2017 Feb;89(4):764-773. (PMID: 27859852)
      Theor Appl Genet. 2019 Mar;132(3):733-750. (PMID: 30448864)
      Bioinformatics. 2014 Sep 1;30(17):2503-5. (PMID: 24812344)
      Genome Res. 2017 Dec;27(12):2050-2060. (PMID: 29097403)
      Nat Rev Genet. 2013 Jan;14(1):49-61. (PMID: 23247435)
      Sci Data. 2017 Dec 19;4:170195. (PMID: 29257136)
      Nat Rev Genet. 2015 Apr;16(4):237-51. (PMID: 25752530)
      Plant J. 2014 Oct;80(1):136-48. (PMID: 25039268)
      Nat Commun. 2017 Jan 24;8:14061. (PMID: 28117401)
      Nature. 2012 May 30;485(7400):635-41. (PMID: 22660326)
      Nat Biotechnol. 2017 Apr 11;35(4):316-319. (PMID: 28398311)
      Nat Plants. 2018 Aug;4(8):512-520. (PMID: 30061748)
      Bioinformatics. 2009 Nov 1;25(21):2865-71. (PMID: 19561018)
      Bioinformatics. 2015 Aug 15;31(16):2741-4. (PMID: 25861968)
      Nature. 2011 Feb 3;470(7332):59-65. (PMID: 21293372)
      Nat Methods. 2012 Jul 15;9(8):796-804. (PMID: 22796662)
      Nat Methods. 2015 Jul;12(7):623-30. (PMID: 25984700)
      Science. 2007 Nov 30;318(5855):1446-9. (PMID: 18048688)
      Genome Res. 2011 Jun;21(6):974-84. (PMID: 21324876)
      Nat Commun. 2019 Apr 16;10(1):1784. (PMID: 30992455)
      Genome Res. 2019 Jul;29(7):1134-1143. (PMID: 31171634)
      Nat Methods. 2016 Dec;13(12):1050-1054. (PMID: 27749838)
      Nat Methods. 2015 Oct;12(10):966-8. (PMID: 26258291)
      Brief Bioinform. 2015 Sep;16(5):852-64. (PMID: 25504367)
      Nat Rev Genet. 2009 Aug;10(8):551-64. (PMID: 19597530)
      Nat Rev Genet. 2011 May;12(5):363-76. (PMID: 21358748)
      Genome Biol. 2019 Jun 3;20(1):117. (PMID: 31159850)
      Bioinformatics. 2015 Jun 15;31(12):2032-4. (PMID: 25697820)
    • Grant Information:
      ALWGR.2015.9 Nederlandse Organisatie voor Wetenschappelijk Onderzoek
    • Contributed Indexing:
      Keywords: Copy number variation; Machine learning; Plant adaptation; Structural variation
    • Publication Date:
      Date Created: 20191109 Date Completed: 20200317 Latest Revision: 20200317
    • Publication Date:
      20240105
    • Accession Number:
      PMC6836508
    • Accession Number:
      10.1186/s12864-019-6153-8
    • Accession Number:
      31699036