چکیده:
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algorithm does not consider the differences between samples, which led the algorithm to have inaccurate predictions. In this paper, we proposed a novel scheme for improving the accuracy of the KNN classification algorithm based on the new weighting technique and stepwise feature selection. First, we used a stepwise feature selection method to eliminate irrelevant features and select highly correlated features with the class category. Then a new weighting method was proposed to give authority value to each sample in train dataset based on neighbor categories and Euclidean distances. This weighting approach gives a higher preference to samples that have neighbors with close Euclidean distance while they are in the same category, which can effectively increase the classification accuracy of the algorithm. We evaluated the accuracy rate of the proposed method and analyzed it with the traditional KNN algorithm and some similar works with the use of five real-world UCI datasets. The experiment results determined that the proposed scheme (denoted by WAD-KNN) performed better than the traditional KNN algorithm and considered approaches with the improvement of approximately 10% accuracy.
خلاصه ماشینی:
A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection Saeid Sheikhi MSc, Department of Computer, Gorgan Branch, Islamic Azad University, Gorgan, Iran.
In this paper, we proposed a novel scheme for improving the accuracy of the KNN classification algorithm based on the new weighting technique and stepwise feature selection.
Then a new weighting method was proposed to give authority value to each sample in train dataset based on neighbor categories and Euclidean distances.
We evaluated the accuracy rate of the proposed method and analyzed it with the traditional KNN algorithm and some similar works with the use of five real-world UCI datasets.
In the present study, we introduce a novel scheme to improve the classification accuracy of the KNN method by creating a new weight calculation technique based on stepwise feature selection.
Related Works earest Neighbor (KNN) algorithm is an instance-based method that computes the similarity between instances on the training data and considering the k top-ranking nearest samples to predict the category of the target sample.
Thus, in this study, we proposed a novel scheme to improve the classification accuracy of the KNN algorithm, which can efficiently classify unknown instances in different datasets.
The calculation method combines the number of N nearest neighbors that have a similar category and sum of distances of N neighbors of each instance to improve the accuracy of the KNN algorithm.
A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection.