Dual Triggered Correspondence Topic (DTCT)model for MeSH annotation.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Author(s): Kim S; Yoon J
  • Source:
    IEEE/ACM transactions on computational biology and bioinformatics [IEEE/ACM Trans Comput Biol Bioinform] 2022 Mar-Apr; Vol. 19 (2), pp. 899-911. Date of Electronic Publication: 2022 Apr 01.
  • Publication Type:
    Journal Article; Research Support, Non-U.S. Gov't
  • Language:
    English
  • Additional Information
    • Source:
      Publisher: IEEE Computer Society Country of Publication: United States NLM ID: 101196755 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1557-9964 (Electronic) Linking ISSN: 15455963 NLM ISO Abbreviation: IEEE/ACM Trans Comput Biol Bioinform Subsets: MEDLINE
    • Publication Information:
      Original Publication: New York, NY : IEEE Computer Society, 2004-
    • Subject Terms:
    • Abstract:
      Accurate Medical Subject Headings (MeSH)annotation is an important issue for researchers in terms of effective information retrieval and knowledge discovery in the biomedical literature. We have developed a powerful dual triggered correspondence topic (DTCT)model for MeSH annotated articles. In our model, two types of data are assumed to be generated by the same latent topic factors and words in abstracts and titles serve as descriptions of the other type, MeSH terms. Our model allows the generation of MeSHs in abstracts to be triggered either by general document topics or by document-specific "special" word distributions in a probabilistic manner, allowing for a trade-off between the benefits of topic-based abstraction and specific word matching. In order to relax the topic influences of non-topical words or domain-frequent words in text description, we integrated the discriminative feature of Okapi BM25 into word sampling probability. This allows the model to choose keywords, which stand out from others, in order to generate MeSH terms. We further incorporate prior knowledge about relations between word and MeSH in DTCT with phi-coefficient to improve topic coherence. We demonstrated the model's usefulness in automatic MeSH annotation. Our model obtained 0.62 F-score 150,00 MEDLINE test set and showed a strength in recall rate. Specially, it yielded competitive performances in an integrated probabilistic environment without additional post-processing for filtering MeSHs.
    • Publication Date:
      Date Created: 20200814 Date Completed: 20220405 Latest Revision: 20220608
    • Publication Date:
      20240105
    • Accession Number:
      10.1109/TCBB.2020.3016355
    • Accession Number:
      32790634