问题
I would like to add (numerous) conditions to a loop that cycles through my data (and currently only picks the closest (not necessarily most recent) previous owner within a set distance).
Previous owners (>20,000) are stored in a dataset called lifetime_census
(data available here):
previous_id reflo locx locy lifespan census_year gr
5587 -310 -3 10 1810 2003 A
7687 -310 -3 10.1 110 2001 A
5101 Q1 17.3 0.8 55 2004 A
9109 Q1 17.4 0.9 953 2003 B
6077 M2 13 1.8 979 2003 B
8044 M2 13.1 1.7 100 2003 A
4076 M2 13.3 1.9 790 2002 B
6130 -49 -4 9 374 2004 A
7307 B.1 2.5 1 1130 2003 A
I then have an owners
dataset (data available here):
squirrel_id spr_census reflo.x spring_locx spring_locy spring_grid
6391 2005 M3 13 2.5 B
6130 2005 -310 -3 10 A
23586 2019 B9 2 9 B
To illustrate what I am trying to achieve:
squirrel_id spr_census reflo.x spring_locx spring_locy spring_grid previous_owner census_year gr
6391 2004 M3 13 2.5 B 6077 2003 B
6130 2005 -310 -3 10 A 5587 2004 A
23586 2019 B9 2 9 B NA NA NA
This scenario finds the most recent and closest previous_id
at the exact reflo
(or nearest previous_id
within a set distance if there is no exact reflo
match), this previous_id
cannot be the same id as the current owner (squirrel_id
, has to be from the same "city" (gr
==spring_grid
), and has to be from the current or past year (spr_census
).
The conditions I'd like to add to the loop (in more technical terms):
- previous owner (
lifetime_census$previous_id
) cannot be current owner (owners$squirrel_id
) - address for previous owner needs to be from the same city (
lifetime_census$gr
) as current owner (owners$spring_grid
) - previous owner has to have lived at the same address sometime in the past or current year (
lifetime_census$census_year
) as current owner (owners$spr_census
)
This gets me part-way there:
Calculates distances:
distance = 30
distance_xy = function (x1, y1, x2, y2) {
sqrt((x2 - x1)^2 + (y2 -y1)^2)
}
Loop to find the previous neighbour at exact reflo
, and then the next closest neighbour if there is no exact reflo
match.
for (i in 1:dim(owners)[1]) {
if (owners$reflo.x[i] %in% lifetime_census$reflo) {
owners$previous_owner[i] = lifetime_census[lifetime_census$reflo == owners$reflo.x[i], ]$previous_id
} else {
dt = distance_xy(owners$spring_locx[i], owners$spring_locy[i], lifetime_census$locx, lifetime_census$locy)
if (any(dt <= distance)) {
owners$previous_owner[i] = lifetime_census[order(dt), ]$previous_id[1L]
} else {
owners$previous_id[i] = NA
}
}
}
Is there a way to pick the closest and most recent previous_id
, after making sure that the previous_id
is not the squirrel_id
, is not from a different "city" (i.e., lifetime_census$gr==owners$spring_grid
), and is not from the "future" (i.e., lifetime_census$census_year <= owners$spr_census
).
I would also like to keep all the other columns associated with the previous_id
, such as census_year
and gr
.
I find the nestedness of if-else statements particularly difficult to decipher, so please be liberal with annotations for your code suggestions.
来源:https://stackoverflow.com/questions/60625882/multiple-conditions-for-if-else-statement-in-r