Return an average of last or first two rows from a different group (indicated by a variable)

问题

This is a follow-up to this question. With a data like below:

data <- structure(list(seq = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 
7L, 7L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L), new_seq = c(2, 2, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
2, 2, 2, 2, NA, NA, NA, NA, NA, 4, 4, 4, 4, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 6, 6, 6, 6, 6, NA, NA, 8, 8, 8, NA, NA, NA), value = c(2L, 
0L, 0L, 3L, 0L, 5L, 5L, 3L, 0L, 3L, 2L, 3L, 2L, 3L, 4L, 1L, 0L, 
0L, 0L, 1L, 1L, 0L, 2L, 5L, 3L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 3L, 
5L, 3L, 1L, 1L, 1L, 0L, 1L, 0L, 4L, 3L, 0L, 3L, 1L, 3L, 0L, 0L, 
1L, 0L, 0L, 3L, 4L, 5L, 3L, 5L, 3L, 5L, 0L, 1L, 1L, 3L, 2L, 1L, 
0L, 0L, 0L, 0L, 5L, 1L, 1L, 0L, 4L, 1L, 5L, 0L, 3L, 1L, 2L, 1L, 
0L, 3L, 0L, 1L, 1L, 3L, 0L, 1L, 1L, 2L, 2L, 1L, 0L, 4L, 0L, 0L, 
3L, 0L, 0L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", 
"data.frame"))

for every value of new_seq, which is not NA I need to calculate a mean of 2 observations from respective group in seq (value of new_seq refers to a value of seq). The issue is that:

for those rows, where new_seq refers to a value of seq which appears after (rows 1:2 in an example) it should be a mean of 2 FIRST rows from respective group,
for those rows where new_seq refers to a value of seq which appears before it should be a mean of 2 LAST rows from respective group

@Z.Lin provided excellent solution for the second case, but how it can be tweaked to handle both cases? Or maybe is there another solution with tidyverse?

回答1:

I think I got it, so I post an answer for the anybody who'll come here from search.

lookup_backwards <- data %>%
  group_by(seq) %>%
  mutate(rank = seq(n(), 1)) %>% 
  filter(rank <= 2) %>%
  summarise(backwards = mean(value)) %>%
  ungroup()

lookup_forwards <- data %>% 
  group_by(seq) %>% 
  mutate(rank = seq(1, n())) %>% 
  filter(rank <= 2) %>% 
  summarise(forwards = mean(value)) %>% 
  ungroup()

data %>% 
  left_join(lookup_backwards, by = c('new_seq' = 'seq')) %>% 
  left_join(lookup_forwards, by = c('new_seq' = 'seq')) %>% 
  replace_na(list(backwards = 0, forwards = 0)) %>% 
  mutate(new_column = ifelse(new_seq > seq, forwards, backwards))

来源：https://stackoverflow.com/questions/52382354/return-an-average-of-last-or-first-two-rows-from-a-different-group-indicated-by

标签

tidyverse