How to renumber result of intersection/group_indices in R?

问题

I am struggling with renumbering result from intersection/ group_indices in R for a few days. Sample data frame is shown below:

t <- data.frame(mid=c(102,102,102,102,102,102,102,103,103,103,103,103,103,103),
                    aid=c(10201,10202,10203,10204,10205,10206,10207,
                          10301,10302,10303,10304,10305,10306,10307),
                    dummy=c(0,1,0,1,0,1,0,0,1,0,1,0,1,0),
                    location=c(0,2,0,4,0,1,0,0,2,0,2,0,3,0)
                    )

I need to update numbers stored in "location" fiels to sequential number by a group of "mid" without changing its order defined by "aid". "mid" is identifier of individuals (people) and "aid" represents their sequential activity log in one day. "location" identifies unique id of location visited by each "mid". Thus, location "2" in the 9th row and that in 11th row are the same place for mid=102; however, the same number in 2nd row does not mean the same place visited by mid=103 for mid=102.

Data frame "t" is listed below:

   mid   aid dummy location
1  102 10201     0        0
2  102 10202     1        2
3  102 10203     0        0
4  102 10204     1        4
5  102 10205     0        0
6  102 10206     1        1
7  102 10207     0        0
8  103 10301     0        0
9  103 10302     1        2
10 103 10303     0        0
11 103 10304     1        2
12 103 10305     0        0
13 103 10306     1        3
14 103 10307     0        0

Based on the above idea, numbers stored in "location" field should be updated as below:

   mid   aid dummy location
1  102 10201     0        0
2  102 10202     1        1
3  102 10203     0        0
4  102 10204     1        2
5  102 10205     0        0
6  102 10206     1        3
7  102 10207     0        0
8  103 10301     0        0
9  103 10302     1        1
10 103 10303     0        0
11 103 10304     1        1
12 103 10305     0        0
13 103 10306     1        2
14 103 10307     0        0

The conditions are:

Location number with "dummy=0" should be kept as 0
Location number should start from 1 for each "mid"
If s/he visited different location compared to the places where s/he visited in the previous rows, add 1 to the new location
The operation should be implemented in piped process provided by tidyverse

The initial data frame is obtained from a piped function in tidyverse using group_indices or base::intersection; however, those functions returns unordered result sometimes.

Are there any solutions for this issue?

I found one solution in this link using {data.table} but I prefer to use tidyrverse to keep pipe operations. There are a lot of examples to give identical numbers in R but I could not find any solutions to renumber those IDs sequentially without changing its order.

回答1:

It seems, OP wants to look-up in location column to uniquely identify location for a group(mid). If so, then by extending solution suggested by @Frank a solution could be:

library(dplyr)

t %>% group_by(mid) %>%
  mutate(locationDesired = match(location, unique(location[dummy==1]), nomatch=0)) %>%
  as.data.frame()

#    mid   aid dummy location locationDesired
# 1  102 10201     0        0               0
# 2  102 10202     1        2               1
# 3  102 10203     0        0               0
# 4  102 10204     1        4               2
# 5  102 10205     0        0               0
# 6  102 10206     1        1               3
# 7  102 10207     0        0               0
# 8  103 10301     0        0               0
# 9  103 10302     1        2               1
# 10 103 10303     0        0               0
# 11 103 10304     1        2               1
# 12 103 10305     0        0               0
# 13 103 10306     1        3               2
# 14 103 10307     0        0               0

来源：https://stackoverflow.com/questions/50160387/how-to-renumber-result-of-intersection-group-indices-in-r

标签

dplyr

tidyr

tidyverse