Q: KNN in R — strange behavior

拜拜、爱过 提交于 2019-12-09 03:58:42

问题


Does anyone know why the following KNN R code gives different predictions for different seeds? This is strange as K<-5, and thus the majority is well defined. In addition, the floating numbers are not that small to fall under a precision of data problem. (remark: I know the test is weirdly different from the training. This is only a synthetic example created to demonstrate the strange KNN behavior)

library(class)

train <- rbind(
  c(0.0626015,  0.0530052,  0.0530052,  0.0496676,  0.0530052,  0.0626015),
  c(0.0565861,  0.0569546,  0.0569546,  0.0511377,  0.0569546,  0.0565861),
  c(0.0538332,  0.057786,   0.057786,   0.0506127,  0.057786,   0.0538332),
  c(0.059033,   0.0541484,  0.0541484,  0.0501926,  0.0541484,  0.059033),
  c(0.0587272,  0.0540445,  0.0540445,  0.0505076,  0.0540445,  0.0587272),
  c(0.0578095,  0.0564349,  0.0564349,  0.0505076,  0.0564349,  0.0578095)
)
trainLabels <- c(1,
                 1,
                 0,
                 0,
                 1,
                 0)
test  <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)

K <- 5

set.seed(494139)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# **predicted: 1**, seed: 494139

set.seed(5371)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# **predicted: 0**, seed: 5371

回答1:


The knn function calls an underlying C function (line 122) called VR_knn, which includes a step that introduces "fuzz" or a small value (epsilon, EPS). Looks like your example parameter values may be hitting up against that "fuzz" step. Evidence for this is the fact that rounding your values to 4 digits yields consistency. As such:

library(class)
train <- rbind(
  c(0.0626015,  0.0530052,  0.0530052,  0.0496676,  0.0530052,  0.0626015),
  c(0.0565861,  0.0569546,  0.0569546,  0.0511377,  0.0569546,  0.0565861),
  c(0.0538332,  0.057786,   0.057786,   0.0506127,  0.057786,   0.0538332),
  c(0.059033,   0.0541484,  0.0541484,  0.0501926,  0.0541484,  0.059033),
  c(0.0587272,  0.0540445,  0.0540445,  0.0505076,  0.0540445,  0.0587272),
  c(0.0578095,  0.0564349,  0.0564349,  0.0505076,  0.0564349,  0.0578095)
)
trainLabels <- c(1,1,0,0,1,0)
test  <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)
K <- 5

train <- round(train,4)

seed <- 494139
set.seed(seed)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# predicted: 0, seed: 494139

seed <- 5371
set.seed(seed)
pred <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred, ", seed: ", seed)
# predicted: 0, seed: 5371


来源:https://stackoverflow.com/questions/38900958/q-knn-in-r-strange-behavior

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!