find First/ Last observation value by group?

前端未结

关注

 4  1193

I am trying to find the first/last observation by group. I tired both R and excel (because it is so slow in R so I tried excel). The excel

Update

Based on the updated expected output, if we need to replace the 1st observation with "0" while the others remain same, either an ifelse or replace can be used and using the lead of 'tagging', we create the 'tagChoice2'.

df1 %>%
   group_by(Shopper) %>% 
   mutate(tagging = ifelse(row_number()==1, "0", as.character(Choice)), 
          tagChoice2 = lead(tagging, default = "0"))   
#   Day Shopper Choice tagging tagChoice2
#  <int>   <chr>  <chr>   <chr>      <chr>
#1     1       A  apple       0      apple
#2     2       A  apple   apple          0
#3     1       B Banana       0          0
#4     1       C  apple       0     Banana
#5     2       C Banana  Banana      apple
#6     3       C  apple   apple          0
#7     1       D  berry       0      berry
#8     2       D  berry   berry          0

0 讨论(0)

野趣味

2020-12-20 09:42

You can try install the Microsoft R open as your default R. In terms of math calculation, it is way faster than R base. Because it employs more cores while the R.BASE only uses one core to compute.

0 讨论(0)
发布评论:

提交评论
- 加载中...

梦如初夏

2020-12-20 09:44

I was looking for answer to finding first and last value of a column by grouping in data.table. After looking here and there, and thinking about it, here you go.

To create order of rows by group:

library(data.table)

DT <- data.table(col1 = rep(LETTERS[1:2], each = 4), col2 = c(3,12,5,56,6,678,233,70))
setorder(DT, col1, col2)
DT
   col1 col2
1:    A    3
2:    A    5
3:    A   12
4:    A   56
5:    B    6
6:    B   70
7:    B  233
8:    B  678

DT[, rank := order(col2), by = col1]
DT
   col1 col2 rank
1:    A    3    1
2:    A    5    2
3:    A   12    3
4:    A   56    4
5:    B    6    1
6:    B   70    2
7:    B  233    3
8:    B  678    4

To create first and last values by group:

DT[, first_val := col2[1], by = col2]
DT[, last_val := col2[.N], by = col1]
DT
   col1 col2 rank first_val last_val
1:    A    3    1         3       56
2:    A    5    2         3       56
3:    A   12    3         3       56
4:    A   56    4         3       56
5:    B    6    1         6      678
6:    B   70    2         6      678
7:    B  233    3         6      678
8:    B  678    4         6      678

0 讨论(0)

无人共我

2020-12-20 09:46

First, assuming the data are sorted by Shopper and then by Day in ascending order, you can add a column indicating the purchase number with

df$Purchase <- unlist(with(df, tapply(Shopper, Shopper, seq_along)))
df
#  Day Shopper Choice Purchase
#1   1       A  apple        1
#2   2       A  apple        2
#3   1       B Banana        1
#4   1       C  apple        1
#5   2       C Banana        2
#6   3       C  apple        3
#7   1       D  berry        1
#8   2       D  berry        2

Then reshape the data-frame to "wide" format with

df.w <- reshape(df[c('Shopper', 'Choice', 'Purchase')],
                idvar='Shopper', v.names='Choice', timevar='Purchase',
                direction='wide')
df.w
#  Shopper Choice.1 Choice.2 Choice.3
#1       A    apple    apple     <NA>
#3       B   Banana     <NA>     <NA>
#4       C    apple   Banana    apple
#7       D    berry    berry     <NA>

Finally you calculate the repurchase matrix of the first two purchases

with(df.w, prop.table(table(First=Choice.1, Second=Choice.2)))
#        Second
#First        apple    Banana     berry
#  apple  0.3333333 0.3333333 0.0000000
#  Banana 0.0000000 0.0000000 0.0000000
#  berry  0.0000000 0.0000000 0.3333333

To calculate the repurchase matrix of all purchases, start with the repurchase matrices of every two consecutive purchases

repurchase <- lapply(seq(2, ncol(df.w) - 1),
                     function(i) table(First=df.w[[i]], Second=df.w[[i + 1]]))
repurchase <- simplify2array(repurchase)
repurchase
#, , 1
#
#        Second
#First    apple Banana berry
#  apple      1      1     0
#  Banana     0      0     0
#  berry      0      0     1
#
#, , 2
#
#        Second
#First    apple Banana berry
#  apple      0      0     0
#  Banana     1      0     0
#  berry      0      0     0

then add all matrices to get the "total" repurchase matrix

apply(repurchase, 1:2, sum)
#        Second
#First    apple Banana berry
#  apple      1      1     0
#  Banana     1      0     0
#  berry      0      0     1

(absolute frequencies)

prop.table(apply(repurchase, 1:2, sum))
#        Second
#First    apple Banana berry
#  apple   0.25   0.25  0.00
#  Banana  0.25   0.00  0.00
#  berry   0.00   0.00  0.25

(relative frequencies)

0 讨论(0)