An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: BioMed Central Country of Publication: England NLM ID: 100965258 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2164 (Electronic) Linking ISSN: 14712164 NLM ISO Abbreviation: BMC Genomics Subsets: MEDLINE
    • Publication Information:
      Original Publication: London : BioMed Central, [2000-
    • Subject Terms:
    • Abstract:
      Background: Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another's effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios.
      Results: We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality.
      Conclusions: We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.
      (© 2024. The Author(s).)
    • References:
      Hum Genomics. 2016 Jul 25;10 Suppl 2:20. (PMID: 27461106)
      Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. (PMID: 36090814)
      Front Genet. 2014 May 06;5:111. (PMID: 24834071)
      PLoS One. 2011 Feb 14;6(2):e17034. (PMID: 21340033)
      Bioinformatics. 2018 Sep 1;34(17):i884-i890. (PMID: 30423086)
      PLoS One. 2013 Apr 29;8(4):e62856. (PMID: 23638157)
      Bioinformatics. 2013 Apr 15;29(8):1072-5. (PMID: 23422339)
      Bioinformatics. 2013 Jul 15;29(14):1718-25. (PMID: 23665771)
      PLoS One. 2014 Sep 08;9(9):e107014. (PMID: 25198770)
      PLoS One. 2011;6(9):e24182. (PMID: 21915294)
      Bioinformatics. 2021 May 1;37(4):568-569. (PMID: 32780803)
      Psychol Methods. 2006 Mar;11(1):54-71. (PMID: 16594767)
      PLoS One. 2012;7(12):e52249. (PMID: 23284954)
      NAR Genom Bioinform. 2021 Mar 27;3(1):lqab019. (PMID: 33817639)
      BMC Bioinformatics. 2015 Jul 24;16:227. (PMID: 26206263)
      BMC Bioinformatics. 2018 Jul 18;19(1):273. (PMID: 30021513)
      BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):43. (PMID: 28361665)
      PLoS One. 2012;7(12):e48837. (PMID: 23251337)
      BMC Genomics. 2022 Feb 1;23(1):92. (PMID: 35105301)
      BMC Bioinformatics. 2017 Aug 18;18(1):374. (PMID: 28821237)
      Sci Rep. 2018 Jul 19;8(1):10950. (PMID: 30026539)
      Methods Mol Biol. 2019;1962:227-245. (PMID: 31020564)
      Genome Res. 2011 Dec;21(12):2224-41. (PMID: 21926179)
      BMC Bioinformatics. 2016 Jul 25;17 Suppl 7:239. (PMID: 27454357)
      J Comput Biol. 2012 May;19(5):455-77. (PMID: 22506599)
      Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006. (PMID: 31584084)
      BMC Genomics. 2015 Aug 28;16:648. (PMID: 26315384)
      Nat Methods. 2009 Apr;6(4):291-5. (PMID: 19287394)
      Nat Biotechnol. 2011 Nov 08;29(11):987-91. (PMID: 22068540)
      Genome Res. 2012 Mar;22(3):557-67. (PMID: 22147368)
      PLoS One. 2013 Apr 12;8(4):e60204. (PMID: 23593174)
      Gigascience. 2020 Feb 1;9(2):. (PMID: 32052832)
      Bioinformatics. 2011 Aug 1;27(15):2031-7. (PMID: 21636596)
      Bioinformatics. 2015 Jun 15;31(12):2032-4. (PMID: 25697820)
      Genome Biol. 2014 Aug 07;15(8):420. (PMID: 25103687)
      BMC Res Notes. 2016 May 12;9:269. (PMID: 27176120)
      Sci Rep. 2021 Sep 21;11(1):18725. (PMID: 34548573)
      Nat Hum Behav. 2018 Jan;2(1):6-10. (PMID: 30980045)
    • Grant Information:
      GINOP-2.3.4-15-2020-00008 European Regional Development Fund; GINOP-2.3.4-15-2020-00008 European Regional Development Fund; GINOP-2.3.4-15-2020-00008 European Regional Development Fund; GINOP-2.3.4-15-2020-00008 European Regional Development Fund; GINOP-2.3.4-15-2020-00008 European Regional Development Fund; GINOP-2.3.4-15-2020-00008 European Regional Development Fund; GINOP-2.3.4-15-2020-00008 European Regional Development Fund
    • Contributed Indexing:
      Keywords: Bacterial genomes; Optical duplicates; PCR duplicates; Sequencing depth; Sequencing error
    • Publication Date:
      Date Created: 20240109 Date Completed: 20240111 Latest Revision: 20240112
    • Publication Date:
      20240112
    • Accession Number:
      PMC10777565
    • Accession Number:
      10.1186/s12864-023-09910-4
    • Accession Number:
      38195441