Analyzing spatial data between two points in R using a very large data set

北城以北 提交于 2021-02-11 05:02:17

问题


This is my first time writing code in R from scratch and I'm struggling with how to approach it. I'm looking at turtle nests and their proximity to light sources (i.e. houses, light poles, etc.) to determine how often a light source is within a given radius of a nest.

These are both very large data sets (hundreds of thousands of rows) so the code will likely need to run a loop for each nest position. GPS coordinates for both data sets are in decimal degrees.

The nest data is essentially latitude, longitude, date observed, and species (if known)

The light source data is latitude, longitude, type, and several other light-related parameters I'd like to keep in the data set.

Any suggestions on how to loop through the nest coordinates to determine light sources within radius, r, would be greatly appreciated! For each light source within r for a nest, I'd like for the end result to spit out the entire row of light source data (type, location, additional light-related parameters, etc.) if that is possible rather than just say how many values were T vs. F for being inside r. Thanks!

> Nest <- read.csv("Nest.csv", header=T)
> Lights <- read.csv("Lights.csv", header=T)
> #Nest
> dput(droplevels(Nest[1:10, ]))
structure(list(LAT = c(34.146535, 34.194585, 34.216854, 34.269901, 
34.358718, 34.37268, 34.380848, 34.394183, 34.410384, 34.415077
), LONG = c(-77.839787, -77.804013, -77.787032, -77.742722, -77.63655, 
-77.619872, -77.609373, -77.591654, -77.568456, -77.561256), 
    DATE = structure(c(2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 1L, 7L
    ), .Label = c("2016-05-19T03:12", "2016-05-21T07:23", "2016-05-23T08:14", 
    "2016-05-24T04:21", "2016-05-25T11:15", "2016-05-27T05:12", 
    "2016-05-27T09:45", "2016-05-28T09:42", "2016-05-28T10:18", 
    "2016-05-29T02:26"), class = "factor"), SPECIES = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Cc", class = "factor")), row.names = c(NA, 
10L), class = "data.frame")
> #Lights
> dput(droplevels(Lights[1:10, ]))
structure(list(LAT = c(34.410925, 34.410803, 34.410686, 34.410476, 
34.410361, 34.410237, 34.410151, 34.410016, 34.409821, 34.409671
), LONG = c(-77.568183, -77.568296, -77.568478, -77.568757, -77.568915, 
-77.569135, -77.569355, -77.569527, -77.569707, -77.569905), 
    DATE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = "5/19/2016", class = "factor"), TYPE = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "R", class = "factor"), 
    WATTS = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
10L), class = "data.frame")

回答1:


As you stated that your data sets were large, the proposed solution tries to avoid a full cartesian product between all Nest vs all Lamps.
For this, we use the non equi join possibilities of data.table which only allows simple operators like > or <.
This allows to make a first filter of the Lamps in a box around a Nest.
This box should be large enough to contain the circle of the max distance to Nest.
In a second step, we calculate the distance on the filtered data (much less calculation than a cartesian product of all data) :

library(data.table)
library(geosphere)

#To data.table
setDT(Nest)
setDT(Lights)

# Define a box around each nest
dlon<- 0.001
dlat <- 0.001

Nest[,c("LATNest","LONGNest","latmin","latmax","longmin","longmax"):=.(LAT,LONG,LAT-dlat, LAT+dlat,LONG-dlon,LONG+dlon)]
Nest[,c("LAT","LONG") :=.(NULL,NULL)]

# Search lights in box
LightNearNest <- Nest[Lights, .(LATNest,LONGNest, LATLight = LAT, LONGLight = LONG), on = .(latmin<LAT , latmax>LAT,longmin<LONG,longmax>LONG),nomatch=0,allow.cartesian=T]     


# Calculate distance 
LightNearNest[,dist:= geosphere::distHaversine(cbind(LONGNest,LATNest),cbind(LONGLight,LATNest))]
LightNearNest

    LATNest  LONGNest LATLight LONGLight      dist
1: 34.41038 -77.56846 34.41092 -77.56818 25.072269
2: 34.41038 -77.56846 34.41080 -77.56830 14.694370
3: 34.41038 -77.56846 34.41069 -77.56848  2.020476
4: 34.41038 -77.56846 34.41048 -77.56876 27.643784
5: 34.41038 -77.56846 34.41036 -77.56892 42.154475
6: 34.41038 -77.56846 34.41024 -77.56914 62.359234
7: 34.41038 -77.56846 34.41015 -77.56936 82.563993


来源:https://stackoverflow.com/questions/62462970/analyzing-spatial-data-between-two-points-in-r-using-a-very-large-data-set

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!