How would I be able to remove opposite values (e.g. refunds) in panel data?

问题

Given the following data:

id|datee      | price | quant | discrete_x
 1 2018-12-19      4    -3000   A
 1 2018-12-04      4     3000   A
 1 2018-12-21      4     3000   B
 1 2018-12-20      3     2000   A
...

Desired output:

id|datee      | price | quant | discrete_x
 1 2018-12-21      4     3000   B
 1 2018-12-20      3     2000   A
...

In this case, it is quite clear that the quantity (quant) of 3000 is refunded, then bought again. I would like to remove the two rows for cancelling each other out. Given that id and quant match while the refund happens once and after a purchase of matching number of quant, how would I be able to remove all of them for each id value?

I've been considering (but stuck on) two ideas so far: 1) Within an arranged group_by values, check the later dates within a column to see if quant would match as opposite values 2) For loop within a for loop

I feel that for loop within a for loop is better, but not sure how I would match on discrete_x.

How would your approach be? Would you use for loop within a for loop?

回答1:

Hope this solution will work for your problem.

df <- abs(df$quant)
df1 <- df[!duplicated(df[c("id","quant")]),]

assuming your data frame name is df.

回答2:

This is a very ugly implementation, but I think this might work. We can create a filtering column after grouping by id and arranging by date.

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  arrange(datee) %>%
  mutate(f = lead(quant) + quant == 0,
         f = ifelse(f, f, lag(f)),
         f = tidyr::replace_na(f, FALSE)) %>%
  filter(!f) %>%
  select(-f)

#> # A tibble: 2 x 6
#> # Groups:   id [1]
#>      id datee      price quant discrete_x    
#>   <dbl> <date>     <dbl> <dbl> <chr>
#> 1     1 2018-12-20     3  2000 A
#> 2     1 2018-12-21     4  3000 B

来源：https://stackoverflow.com/questions/59753716/how-would-i-be-able-to-remove-opposite-values-e-g-refunds-in-panel-data

标签

dplyr

tidyverse