K NN HI : Resilient K NN algorithm for heterogeneous incomplete data classification and K identification using rough set theory.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Subject Terms:
    • Abstract:
      The original K -nearest neighbour (K NN) algorithm was meant to classify homogeneous complete data, that is, data with only numerical features whose values exist completely. Thus, it faces problems when used with heterogeneous incomplete (HI) data, which has also categorical features and is plagued with missing values. Many solutions have been proposed over the years but most have pitfalls. For example, some solve heterogeneity by converting categorical features into numerical ones, inflicting structural damage. Others solve incompleteness by imputation or elimination, causing semantic disturbance. Almost all use the same K for all query objects, leading to misclassification. In the present work, we introduce K NNHI, a K NN-based algorithm for HI data classification that avoids all these pitfalls. Leveraging rough set theory, K NNHI preserves both categorical and numerical features, leaves missing values untouched and uses a different K for each query. The end result is an accurate classifier, as demonstrated by extensive experimentation on nine datasets mostly from the University of California Irvine repository, using a 10-fold cross-validation technique. We show that K NNHI outperforms six recently published K NN-based algorithms, in terms of precision, recall, accuracy and F-Score. In addition to its function as a mighty classifier, K NNHI can also serve as a K calculator, helping K NN-based algorithms that use a single K value for all queries that find the best such value. Sure enough, we show how four such algorithms improve their performance using the K obtained by K NNHI. Finally, K NNHI exhibits impressive resilience to the degree of incompleteness, degree of heterogeneity and the metric used to measure distance. [ABSTRACT FROM AUTHOR]
    • Abstract:
      Copyright of Journal of Information Science is the property of Sage Publications, Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)