Interpolate values from a grid efficiently in R

旧城冷巷雨未停 提交于 2019-12-06 03:59:56

问题


I have a grid of ocean depth data by location, and am trying to interpolate depth values for a selection of GPS points.

We've been using RSAGA::pick.from.points, which works fine for small data sets.

require(RSAGA)

depthdata <- cbind.data.frame(x=c(74.136, 74.135, 74.134, 74.133, 74.132, 74.131, 74.130, 74.129, 74.128, 74.127), 
y=rep(40, times=10), 
depth=c(-0.6, -0.6, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.6, -0.6))

mylocs <- rbind(c(-74.1325, 40), c(-74.1305, 40))
colnames(mylocs) <- c("x", "y")

results <- pick.from.points(data=mylocs, src=depthdata, pick=c("depth"), method="nearest.neighbour")
mydepths <- results$depth

But our depth data set contains 69 million data points, and we have 5 million GPS points that we'd like depth estimates for, and pick.from.points is just taking too long (> 2 weeks) for this data set. We think that we could accomplish this task more quickly in MATLAB or ArcMap, but we're trying to incorporate this task into a longer workflow in R that we're writing for other people to run repeatedly, so switching to proprietary software for part of that workflow is less than desirable.

We'd be willing to sacrifice some degree of accuracy for speed.

I've looked for a solution as best as I can, but I'm fairly new to grid data and interpolation, so might be using inappropriate language and therefore missing a simple solution.



回答1:


If you were willing to impute by finding the nearest neighbor and using its value, I think the trick would be to use an efficient nearest neighbors implementation that allows you to find the nearest neighbor among n alternatives in O(log(n)) time. The k-d tree provides this sort of performance, and is available through the FNN package in R. While the computation (on randomly generated data with 69 million data points for reference and 5 million data points to impute) isn't instantaneous (it takes about 3 minutes), it's much quicker than 2 weeks!

data <- cbind(x=rnorm(6.9e7), y=rnorm(6.9e7))
labels <- rnorm(6.9e7)
query <- cbind(x=rnorm(5e6), y=rnorm(5e6))

library(FNN)
get.nn <- function(data, labels, query) {
  nns <- get.knnx(data, query, k=1)
  labels[nns$nn.index]
}
system.time(get.nn(data, labels, query))
#    user  system elapsed
# 174.975   2.236 177.617

As a warning, the process peaked around 10GB of RAM, so you will need significant memory resources to run on a dataset of your size.



来源:https://stackoverflow.com/questions/30262434/interpolate-values-from-a-grid-efficiently-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!