Following a previous question (Faster reading of time series from netCDF?) I have re-permuted my netCDF files to provide fast time-series reads (scripts on github to be cleaned up eventually ...).
In short, to make reads faster, I have rearranged the dimensions from lat, lon, time
to time, lat, lon
. Now, my existing scripts break because they assume that the dimensions will always be lat, lon, time
, following the ncdf4 documentation of ncvar_get
, for the 'start' argument:
Order is X-Y-Z-T (i.e., the time dimension is last)
However, this is not the case.
Furthermore, there is a related inconsistency in the order of variables listed via the commandline netCDF utility ncdump -h
and the R function ncdf4::nc_open
. The first says that the dimensions are in the expected (lat, lon, time) order while the latter sees dimensions with time first (time, lat, lon).
For a minimal example, download the file test.nc and run
bash-$ ncdump -h .nc
bash-$ R
R> library(ncdf4)
R> print(nc_open("test.nc")
What I want to do is get records 5-15 from the variable "lwdown"
my.nc <- nc_open("test.nc")
But this doesn't work, since R sees the time dimension first, so I must change my scripts to
ncvar_get(my.nc, "lwdown", start = c(5, 1, 1), count = c(10, 1, 1))
It wouldn't be so bad to update my scripts and functions, except that I want to be able to read data from files regardless of the dimension order.
Other than Is there a way to generalize this function so that it works independent of dimension order?
While asking the question, I figured out this solution, though there is still room for improvement:
The closest I can get is to open the file and find the order in this way:
my.nc$var$lwdown$dim[[1]]$name
[1] "time"
my.nc$var$lwdown$dim[[2]]$name
[1] "lon"
my.nc$var$lwdown$dim[[3]]$name
[1] "lat"
which is a bit unsatisfying, although it led me to this solution:
If I want to start at c(lat = 1, lon = 1, time = 5)
, but the ncvar_get
expects an arbitrary order, I can say"
start <- c(lat = 1, lon = 1, time = 5)
count <- c(lat = 1, lon = 1, time = 10)
dim.order <- sapply(my.nc$var$lwdown$dim, function(x) x$name)
ncvar_get(my.nc, "lwdown", start = start[dim.order], count = count[dim.order])
I ran into this recently as well. I have a netcdf with data in this format
nc_in <- nc_open("my.nc")
nc_in$dim[[1]]$name == "time"
nc_in$dim[[2]]$name == "latitude"
nc_in$dim[[3]]$name == "longitude"
nc_in$dim[[1]]$len == 3653 # this is the number of timesteps in my netcdf
nc_in$dim[[2]]$len == 180 # this is the number of longitude cells
nc_in$dim[[3]]$len == 360 # this is the number of latitude cells
The obnoxious part here is that the DIM component of the netCDF is in the order of T,Y,X
If I try to to grab time series data for the pr var using the indices in the order they appear in nc_in$dim I get an error
ncvar_get(nc_in,"pr")[3653,180,360] # 'subscript out of bounds'
If I instead grab data in X,Y,T order, it works:
ncvar_get(nc_in,"pr")[360,180,3653] # gives me a value
What I don't understand is how the ncvar_get() package knows what variable represents X, Y and T, especially if you have generated your own netCDF.
来源:https://stackoverflow.com/questions/22944707/how-can-i-specify-dimension-order-when-using-ncdf4ncvar-get