问题
I would like to add (numerous) conditions to a loop that cycles through my data (and currently only picks the closest (not necessarily most recent) previous owner within a set distance).
Previous owners (>20,000) are stored in a dataset called lifetime_census(data available here):
previous_id reflo locx locy lifespan census_year gr
5587 -310 -3 10 1810 2003 A
7687 -310 -3 10.1 110 2001 A
5101 Q1 17.3 0.8 55 2004 A
9109 Q1 17.4 0.9 953 2003 B
6077 M2 13 1.8 979 2003 B
8044 M2 13.1 1.7 100 2003 A
4076 M2 13.3 1.9 790 2002 B
6130 -49 -4 9 374 2004 A
7307 B.1 2.5 1 1130 2003 A
I then have an owners dataset (data available here):
squirrel_id spr_census reflo.x spring_locx spring_locy spring_grid
6391 2005 M3 13 2.5 B
6130 2005 -310 -3 10 A
23586 2019 B9 2 9 B
To illustrate what I am trying to achieve:
squirrel_id spr_census reflo.x spring_locx spring_locy spring_grid previous_owner census_year gr
6391 2004 M3 13 2.5 B 6077 2003 B
6130 2005 -310 -3 10 A 5587 2004 A
23586 2019 B9 2 9 B NA NA NA
This scenario finds the most recent and closest previous_id at the exact reflo (or nearest previous_id within a set distance if there is no exact reflo match), this previous_id cannot be the same id as the current owner (squirrel_id, has to be from the same "city" (gr==spring_grid), and has to be from the current or past year (spr_census).
The conditions I'd like to add to the loop (in more technical terms):
- previous owner (
lifetime_census$previous_id) cannot be current owner (owners$squirrel_id) - address for previous owner needs to be from the same city (
lifetime_census$gr) as current owner (owners$spring_grid) - previous owner has to have lived at the same address sometime in the past or current year (
lifetime_census$census_year) as current owner (owners$spr_census)
This gets me part-way there:
Calculates distances:
distance = 30
distance_xy = function (x1, y1, x2, y2) {
sqrt((x2 - x1)^2 + (y2 -y1)^2)
}
Loop to find the previous neighbour at exact reflo, and then the next closest neighbour if there is no exact reflo match.
for (i in 1:dim(owners)[1]) {
if (owners$reflo.x[i] %in% lifetime_census$reflo) {
owners$previous_owner[i] = lifetime_census[lifetime_census$reflo == owners$reflo.x[i], ]$previous_id
} else {
dt = distance_xy(owners$spring_locx[i], owners$spring_locy[i], lifetime_census$locx, lifetime_census$locy)
if (any(dt <= distance)) {
owners$previous_owner[i] = lifetime_census[order(dt), ]$previous_id[1L]
} else {
owners$previous_id[i] = NA
}
}
}
Is there a way to pick the closest and most recent previous_id, after making sure that the previous_id is not the squirrel_id, is not from a different "city" (i.e., lifetime_census$gr==owners$spring_grid), and is not from the "future" (i.e., lifetime_census$census_year <= owners$spr_census).
I would also like to keep all the other columns associated with the previous_id, such as census_year and gr.
I find the nestedness of if-else statements particularly difficult to decipher, so please be liberal with annotations for your code suggestions.
来源:https://stackoverflow.com/questions/60625882/multiple-conditions-for-if-else-statement-in-r