问题
Given the following data:
id|datee | price | quant | discrete_x
1 2018-12-19 4 -3000 A
1 2018-12-04 4 3000 A
1 2018-12-21 4 3000 B
1 2018-12-20 3 2000 A
...
Desired output:
id|datee | price | quant | discrete_x
1 2018-12-21 4 3000 B
1 2018-12-20 3 2000 A
...
In this case, it is quite clear that the quantity (quant
) of 3000 is refunded, then bought again. I would like to remove the two rows for cancelling each other out. Given that id
and quant
match while the refund happens once and after a purchase of matching number of quant
, how would I be able to remove all of them for each id
value?
I've been considering (but stuck on) two ideas so far:
1) Within an arranged group_by
values, check the later dates within a column to see if quant
would match as opposite values
2) For loop within a for loop
I feel that for loop within a for loop is better, but not sure how I would match on discrete_x
.
How would your approach be? Would you use for loop within a for loop?
回答1:
Hope this solution will work for your problem.
df <- abs(df$quant)
df1 <- df[!duplicated(df[c("id","quant")]),]
assuming your data frame name is df.
回答2:
This is a very ugly implementation, but I think this might work. We can create a filtering column after grouping by id
and arranging by date
.
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
arrange(datee) %>%
mutate(f = lead(quant) + quant == 0,
f = ifelse(f, f, lag(f)),
f = tidyr::replace_na(f, FALSE)) %>%
filter(!f) %>%
select(-f)
#> # A tibble: 2 x 6
#> # Groups: id [1]
#> id datee price quant discrete_x
#> <dbl> <date> <dbl> <dbl> <chr>
#> 1 1 2018-12-20 3 2000 A
#> 2 1 2018-12-21 4 3000 B
来源:https://stackoverflow.com/questions/59753716/how-would-i-be-able-to-remove-opposite-values-e-g-refunds-in-panel-data