Count the number previous items in by group in R [duplicate]

…衆ロ難τιáo~ 提交于 2019-11-28 10:57:12

问题


This question already has an answer here:

  • Adding a counter column for a set of similar rows in R [duplicate] 1 answer

I would like to create a new variable which counts the number of previous items in a by group. Here is what I mean, taking the esoph dataset as an example.

first I sort the dataset by my by group esoph$agegp, esoph$alcgp and an additional value column -esoph$ncontrols.

This gives me the following dataset

x<-esoph[order(esoph$agegp, esoph$alcgp, -esoph$ncontrols ), ]
x

   agegp     alcgp    tobgp ncases ncontrols
1  25-34 0-39g/day 0-9g/day      0        40
2  25-34 0-39g/day    10-19      0        10
3  25-34 0-39g/day    20-29      0         6
4  25-34 0-39g/day      30+      0         5
5  25-34     40-79 0-9g/day      0        27
6  25-34     40-79    10-19      0         7
8  25-34     40-79      30+      0         7
7  25-34     40-79    20-29      0         4
9  25-34    80-119 0-9g/day      0         2
11 25-34    80-119      30+      0         2
...

Now, I would like to create a new variable with some sort of index, increasing by one on every row. Whenever the next by group starts, the index goes back to 1.

The resulting table would be the following (with the additional index column):

   agegp     alcgp    tobgp ncases ncontrols index
1  25-34 0-39g/day 0-9g/day      0        40     1
2  25-34 0-39g/day    10-19      0        10     2
3  25-34 0-39g/day    20-29      0         6     3
4  25-34 0-39g/day      30+      0         5     4
5  25-34     40-79 0-9g/day      0        27     1
6  25-34     40-79    10-19      0         7     2
8  25-34     40-79      30+      0         7     3
7  25-34     40-79    20-29      0         4     4
9  25-34    80-119 0-9g/day      0         2     1
11 25-34    80-119      30+      0         2     2
...

How do I calculate this column?

Thanks!


回答1:


This can be approached using either specialized packages such as dplyr which has row_number(). We need to group by the variable ('alcgp') and create a new column using mutate.

library(dplyr)
df1 %>%
   group_by( alcgp) %>%
   mutate(indx= row_number())

Or using ave from base R. We group by 'alcgp' and in the FUN we can specify seq_along. I used seq_along(alcgp) as it may not work if the variable is factor class.

 df1$indx <- with(df1, ave(seq_along(alcgp), alcgp, FUN=seq_along))

Another convenient function in splitstackshape i.e. getanID

 library(splitstackshape)
 getanID(df1, 'alcgp')



回答2:


Here's a data.table solution:

R> (data.table(Data)[,index := 1:.N, by = "agegp,alcgp"])
    agegp     alcgp    tobgp ncases ncontrols index
 1: 25-34 0-39g/day 0-9g/day      0        40     1
 2: 25-34 0-39g/day    10-19      0        10     2
 3: 25-34 0-39g/day    20-29      0         6     3
 4: 25-34 0-39g/day      30+      0         5     4
 5: 25-34     40-79 0-9g/day      0        27     1
 6: 25-34     40-79    10-19      0         7     2
 7: 25-34     40-79      30+      0         7     3
 8: 25-34     40-79    20-29      0         4     4
 9: 25-34    80-119 0-9g/day      0         2     1
10: 25-34    80-119      30+      0         2     2

library(data.table)
##
Data <- read.table(
  text = "   agegp     alcgp    tobgp ncases ncontrols
1  25-34 0-39g/day 0-9g/day      0        40
2  25-34 0-39g/day    10-19      0        10
3  25-34 0-39g/day    20-29      0         6
4  25-34 0-39g/day      30+      0         5
5  25-34     40-79 0-9g/day      0        27
6  25-34     40-79    10-19      0         7
8  25-34     40-79      30+      0         7
7  25-34     40-79    20-29      0         4
9  25-34    80-119 0-9g/day      0         2
11 25-34    80-119      30+      0         2",
  header = TRUE,
  stringsAsFactors = FALSE
)


来源:https://stackoverflow.com/questions/32118727/count-the-number-previous-items-in-by-group-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!