Interpolate values from a grid efficiently in R

只谈情不闲聊 提交于 2019-12-04 08:19:32

If you were willing to impute by finding the nearest neighbor and using its value, I think the trick would be to use an efficient nearest neighbors implementation that allows you to find the nearest neighbor among n alternatives in O(log(n)) time. The k-d tree provides this sort of performance, and is available through the FNN package in R. While the computation (on randomly generated data with 69 million data points for reference and 5 million data points to impute) isn't instantaneous (it takes about 3 minutes), it's much quicker than 2 weeks!

data <- cbind(x=rnorm(6.9e7), y=rnorm(6.9e7))
labels <- rnorm(6.9e7)
query <- cbind(x=rnorm(5e6), y=rnorm(5e6))

library(FNN)
get.nn <- function(data, labels, query) {
  nns <- get.knnx(data, query, k=1)
  labels[nns$nn.index]
}
system.time(get.nn(data, labels, query))
#    user  system elapsed
# 174.975   2.236 177.617

As a warning, the process peaked around 10GB of RAM, so you will need significant memory resources to run on a dataset of your size.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!