Ranges surrounding values in data frame R dplyr

天涯浪子 提交于 2020-01-15 07:16:12

问题


I have a data frame that looks something like this :

test <- data.frame(chunk = c(rep("a",27),rep("b",27)), x = c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1))

There is a column by which I would like to group the data using group_by() in dplyr, which in this example is called chunk

I want to add another column to each chunk of test called x1 so the resulting data frame looks like this :

test1 <- data.frame(test, x1 = c(0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0))

x1 identifies all of the occurrences of 0 in x and takes a range of +-5 rows in each direction from the end 0s and adds an identifier. What the identifier is doesn't matter, but in this example the identifier in x1 is 1 for the range and 2 for the occurrences of 0 in x

Thanks for any and all help!


回答1:


Here's an option to do it in dplyr:

Shorter version:

n <- 1:5
test %>%
  group_by(chunk) %>%  
  mutate(x1 = ifelse((row_number() - min(which(x == 0))) %in% -n |
       (row_number(chunk) - max(which(x == 0))) %in% n, 1, ifelse(x == 0, 2, 0))) 

Longer (first) version:

test %>%
  group_by(chunk) %>%
  mutate(start = (row_number() - min(which(x == 0))) %in% -5:-1,
         end = (row_number() - max(which(x == 0))) %in% 1:5,
         x1 = ifelse(start | end, 1, ifelse(x == 0, 2, 0))) %>%
  select(-c(start, end))

Source: local data frame [54 x 3]
Groups: chunk

   chunk x x1
1      a 1  0
2      a 1  0
3      a 1  0
4      a 1  0
5      a 1  0
6      a 1  0
7      a 1  0
8      a 1  1
9      a 1  1
10     a 1  1
11     a 1  1
12     a 1  1
13     a 0  2
14     a 0  2
15     a 0  2
16     a 0  2
17     a 1  1
18     a 1  1
19     a 1  1
20     a 1  1
21     a 1  1
22     a 1  0
23     a 1  0
24     a 1  0
25     a 1  0
26     a 1  0
27     a 1  0
28     b 1  0
29     b 1  0
30     b 1  0
31     b 1  0
32     b 1  0
33     b 1  0
34     b 1  0
35     b 1  1
36     b 1  1
37     b 1  1
38     b 1  1
39     b 1  1
40     b 0  2
41     b 0  2
42     b 0  2
43     b 0  2
44     b 1  1
45     b 1  1
46     b 1  1
47     b 1  1
48     b 1  1
49     b 1  0
50     b 1  0
51     b 1  0
52     b 1  0
53     b 1  0
54     b 1  0

The assumption in this approach is, that in each group of "chunk" there is only one sequence of 0s (as in the sample data). Let me know if that's not the case in your actual data.



来源:https://stackoverflow.com/questions/25415587/ranges-surrounding-values-in-data-frame-r-dplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!