r raster brick sum values in the cells determined by two different rasters, how to speed up calculations

问题

I'm working with climate data files with daily data so for most years 365 rasters in a brick. I want to sum over the value in files for subsets of days - say day x to day y. This can be done with stackApply. I've created some code below that generates some rasters, creates a brick and applies stackApply using specific values for x and y, 1 and 3.

What I need however is for x and y to taken from two raster layers. In the code below they are called raster.start and raster.end. Below the first set of code I have a second set that works but is slow.

library(raster)
r <- raster(nrows=100, ncols=100)
s <- stack(lapply(1:5, function(i) setValues(r, runif(ncell(r), min = -10*i, max = 10))))
raster.start <- setValues(r, sample(2, ncell(r), replace=TRUE))
raster.end <- raster.start + 3
rasterb <- brick(s)

indices <- format(as.Date(names(rasterb), format = "layer.%d"), format = "%d")
indices <- c(1,1,1,1,1)

datasum.all <- stackApply(rasterb, indices, fun = sum)
datasum.sub1 <- stackApply(rasterb[[c(1:3)]], indices, fun = sum)

The idea is to step through the rows and columns of the start and end raster to subset the brick and operate on it. Here's the code I developed to do this.

raster.out <- r
for (i in 1:nrow(r)){
  for (j in 1:ncol(r)){
    start <- raster.start[[1]][i,j] # get the starting day
    end <- raster.end[[1]][i,j] # get the ending day
    raster.out[i,j] <- sum(rasterb[[start:end]][i,j])
  }
}

However, even for this toy example the computation time is slow. It took about 1.3 minutes to complete. I tried replacing some of the code with functions, as follows but it had no effect on the time to completion. Any advice on how to speed up this process greatly appreciated.

startEnd <- function(raster.start, raster.end, i,j) {
  start <- raster.start[i,j] # get the starting day
  end <- raster.end[i,j] # get the ending day
  return(c(start,end))
}

rasterOutValue <- function(rasterb, i, j, startEnd){
  return(sum(rasterb[[startEnd]][i,j]))
}

for (i in 1:nrow(raster.in1)){
  for (j in 1:ncol(raster.in1)){
    raster.out[i,j] <-rasterOutValue(rasterb, i, j, startEnd(raster.start, raster.end, i,j))
  }
}

回答1:

Your example data

library(raster)
r <- raster(nrows=100, ncols=100)
set.seed(88)
b <- stack(lapply(1:5, function(i) setValues(r, runif(ncell(r), min = -10*i, max = 10))))
r.start <- setValues(r, sample(2, ncell(r), replace=TRUE))
r.end <- raster.start + 3

First an improved version of your example that works, but is too slow. The below is considerably faster, but still rather slow.

raster.out <- r
for (i in 1:ncell(r)){
    start <- raster.start[i] # get the starting day
    end <- raster.end[i] # get the ending day
    raster.out[i] <- sum(rasterb[i][start:end])
}

That brings the time down from 74 to 5 seconds for me. But you should never loop over cells, that is always going to be too slow. Instead, you can do (in 0.04 seconds for me):

s <- stack(r.start, r.end, b)
x <- calc(s, fun=function(x) sum(x[(x[1]:x[2])+2]))
#class      : RasterLayer 
#dimensions : 100, 100, 10000  (nrow, ncol, ncell)
#resolution : 3.6, 1.8  (x, y)
#extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#crs        : +proj=longlat +datum=WGS84 +no_defs 
#source     : memory
#names      : layer 
#values     : -129.5758, 30.31813  (min, max)

And that seems to be correct

a <- s[1]
a
#     layer.1.1 layer.2.1 layer.1.2 layer.2.2  layer.3   layer.4   layer.5
#[1,]         1         4 -1.789974  2.640807 4.431439 -23.09203 -5.688119    

fun <- function(x) sum(x[(x[1]:x[2])+2])
fun(a)
#[1] -17.80976
x[1]
#[1] -17.80976

calc is to Raster objects what apply is to matrices. (that is why it is called app in terra.

The place to start is to first write a function that does what you want with a vector.

x <- 1:10
test1 <- function(start, end, values) {
    mean(values[start:end]) 
}
test1(2, 5, x)
test1(5, 8, x)

calc only takes one argument, so a function like this

test2 <- function(values) {
    # the +2 to skip the first two elements in the computation
    start <- values[1] + 2
    end <- values[2] + 2
    mean(values[start:end]) 
}

test2(c(2, 5, x))
test2(c(5, 8, x))

And a more concise version

test3 <- function(v) {
    mean(v[ (v[1]:v[2])+2 ] ) 
}
 test3(c(2, 5, x))
 #[1] 3.5
 test3(c(5, 8, x))
 #[1] 6.5

Second addition (and reminder to always check with NA values!). test3 breaks when one of the indices (start and end) are NA (it is OK if the others are NA)

test3(c(NA, 5, x))
#Error in v[1]:v[2] : NA/NaN argument

So we need a function that catches these

test4 <- function(v) {
    if (any(is.na(v[1:2]))) {
        NA
    } else {
        mean(v[ (v[1]:v[2])+2 ] ) 
    }
}

test4(c(NA, 5, x))
#[1] NA
test4(c(1, 5, x))
#[1] 3

Typically "start" and "end" will both be NA at the same time, so a simpler version that should also work could be

test5 <- function(v) {
    if (is.na(v[1])) {
        NA
    } else {
        mean(v[ (v[1]:v[2])+2 ] ) 
    }
}

This approach with calc might be slow as it turns a RasterBrick into a RasterStack with 365 + 2 layers. That considerabley slows downs reading the data. So you could try this approach with overlay instead (here using sum again)

f <- function(i, v) {
    j <- !is.na(i[,1])
    r <- rep(NA, nrow(i))
    x <- cbind(i[j,,drop=FALSE], v[j,,drop=FALSE])
    r[j] <- apply(x, 1, function(y) sum(y[ (y[1]:y[2])+2 ] )) 
    r
}
cal <-stack(r.start, r.end)
x <- overlay(cal, b, fun= f, recycle=FALSE)
x
#class      : RasterLayer 
# ...
#values     : -129.5758, 30.31813  (min, max)

You can speed up the algorithm by writing it in Rcpp/C++

library(Rcpp)
cppFunction('std::vector<double> gtemp(NumericMatrix cal, NumericMatrix wth) {
    std::vector<double> out(cal.nrow(), NAN);
    for (int i=0; i<cal.nrow(); i++) {
      if (!std::isnan(cal(i,0))){
         NumericVector v = wth(i,_);
         size_t start = cal(i,0)-1;
         size_t end = cal(i,1);
         out[i] = std::accumulate(v.begin()+start, v.begin()+end, 0.0);
      }  
    }
    return out;
}')

x <- overlay(cal, b, fun=gtemp, recycle=FALSE)

And here is how you can do this with terra (version >= 0.6-14) and the rapp (range-apply) method.

Example data

library(terra)
d <- rast(nrows=100, ncols=100, nl=5)
rstart <- rast(d, nlyr=1)
nc <- ncell(d) 
set.seed(88)
values(d) <- t(sapply(1:5, function(i) runif(nc, min = -10*i, max = 10)))
values(rstart) <- sample(2, nc, replace=TRUE)
rend <- rstart + 3

Solution

idx <- c(rstart, rend)
z <- rapp(d, idx, "sum")
z  
#class       : SpatRaster 
#dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#resolution  : 3.6, 1.8  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
#data source : memory 
#names       :      lyr1 
#min values  : -184.6918 
#max values  :  34.93876

来源：https://stackoverflow.com/questions/61578461/r-raster-brick-sum-values-in-the-cells-determined-by-two-different-rasters-how

标签

r-raster