binomialRF: interpretable combinatoric efficiency of random forests to identify biomarker interactions.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
    • Publication Information:
      Original Publication: [London] : BioMed Central, 2000-
    • Subject Terms:
    • Abstract:
      Background: In this era of data science-driven bioinformatics, machine learning research has focused on feature selection as users want more interpretation and post-hoc analyses for biomarker detection. However, when there are more features (i.e., transcripts) than samples (i.e., mice or human samples) in a study, it poses major statistical challenges in biomarker detection tasks as traditional statistical techniques are underpowered in high dimension. Second and third order interactions of these features pose a substantial combinatoric dimensional challenge. In computational biology, random forest (RF) classifiers are widely used due to their flexibility, powerful performance, their ability to rank features, and their robustness to the "P > > N" high-dimensional limitation that many matrix regression algorithms face. We propose binomialRF, a feature selection technique in RFs that provides an alternative interpretation for features using a correlated binomial distribution and scales efficiently to analyze multiway interactions.
      Results: In both simulations and validation studies using datasets from the TCGA and UCI repositories, binomialRF showed computational gains (up to 5 to 300 times faster) while maintaining competitive variable precision and recall in identifying biomarkers' main effects and interactions. In two clinical studies, the binomialRF algorithm prioritizes previously-published relevant pathological molecular mechanisms (features) with high classification precision and recall using features alone, as well as with their statistical interactions alone.
      Conclusion: binomialRF extends upon previous methods for identifying interpretable features in RFs and brings them together under a correlated binomial distribution to create an efficient hypothesis testing algorithm that identifies biomarkers' main effects and interactions. Preliminary results in simulations demonstrate computational gains while retaining competitive model selection and classification accuracies. Future work will extend this framework to incorporate ontologies that provide pathway-level feature selection from gene expression input data.
    • Comments:
      Erratum in: BMC Bioinformatics. 2020 Nov 2;21(1):495. (PMID: 33138767)
    • References:
      Nucleic Acids Res. 2007 Jul;35(Web Server issue):W339-44. (PMID: 17553836)
      BMC Bioinformatics. 2007 Sep 03;8:328. (PMID: 17767709)
      Hum Hered. 2011;72(2):121-32. (PMID: 21996641)
      N Engl J Med. 2018 Mar 15;378(11):981-983. (PMID: 29539284)
      BMC Bioinformatics. 2006 Jan 06;7:3. (PMID: 16398926)
      BioData Min. 2016 Apr 06;9:14. (PMID: 27053949)
      Bioinformatics. 2010 May 15;26(10):1340-7. (PMID: 20385727)
      BMC Bioinformatics. 2013 Jan 16;14:5. (PMID: 23323760)
      Ann N Y Acad Sci. 2004 May;1020:154-74. (PMID: 15208191)
      Int J Med Inform. 2020 Sep;141:104148. (PMID: 32535186)
      Bioinformatics. 2016 Mar 15;32(6):952-4. (PMID: 26568634)
      Theory Biosci. 2012 Dec;131(4):281-5. (PMID: 22872506)
      BioData Min. 2016 Feb 01;9:7. (PMID: 26839594)
      Nat Genet. 2000 May;25(1):25-9. (PMID: 10802651)
      Brief Bioinform. 2019 Mar 22;20(2):492-503. (PMID: 29045534)
      Genomics. 2009 Dec;94(6):423-32. (PMID: 19699293)
      J Am Med Inform Assoc. 2014 Nov-Dec;21(6):1015-25. (PMID: 25301808)
      Genomics. 2012 Jun;99(6):323-9. (PMID: 22546560)
      AMIA Annu Symp Proc. 2020 Mar 04;2019:582-591. (PMID: 32308852)
      BMJ. 2019 Mar 12;364:l886. (PMID: 30862612)
      BioData Min. 2017 Jun 27;10:21. (PMID: 28674556)
      Pac Symp Biocomput. 2018;23:484-495. (PMID: 29218907)
      Gene. 2013 Apr 10;518(1):179-86. (PMID: 23219997)
      BMC Genet. 2010 Jun 14;11:49. (PMID: 20546594)
      J Am Med Inform Assoc. 2017 Nov 1;24(6):1116-1126. (PMID: 29016970)
    • Grant Information:
      U01 AI122275 United States AI NIAID NIH HHS; P30CA023074 United States CA NCI NIH HHS; 1UG3OD023171 United States NH NIH HHS
    • Accession Number:
      0 (Biomarkers)
      0 (Biomarkers, Tumor)
    • Publication Date:
      Date Created: 20200830 Date Completed: 20200929 Latest Revision: 20231003
    • Publication Date:
      20240105
    • Accession Number:
      PMC7456085
    • Accession Number:
      10.1186/s12859-020-03718-9
    • Accession Number:
      32859146