This question already has an answer here:
I would like to create a new variable which counts the number of previous items in a by group. Here is what I mean, taking the esoph
dataset as an example.
first I sort the dataset by my by group esoph$agegp, esoph$alcgp
and an additional value column -esoph$ncontrols
.
This gives me the following dataset
x<-esoph[order(esoph$agegp, esoph$alcgp, -esoph$ncontrols ), ]
x
agegp alcgp tobgp ncases ncontrols
1 25-34 0-39g/day 0-9g/day 0 40
2 25-34 0-39g/day 10-19 0 10
3 25-34 0-39g/day 20-29 0 6
4 25-34 0-39g/day 30+ 0 5
5 25-34 40-79 0-9g/day 0 27
6 25-34 40-79 10-19 0 7
8 25-34 40-79 30+ 0 7
7 25-34 40-79 20-29 0 4
9 25-34 80-119 0-9g/day 0 2
11 25-34 80-119 30+ 0 2
...
Now, I would like to create a new variable with some sort of index, increasing by one on every row. Whenever the next by group starts, the index goes back to 1.
The resulting table would be the following (with the additional index column):
agegp alcgp tobgp ncases ncontrols index
1 25-34 0-39g/day 0-9g/day 0 40 1
2 25-34 0-39g/day 10-19 0 10 2
3 25-34 0-39g/day 20-29 0 6 3
4 25-34 0-39g/day 30+ 0 5 4
5 25-34 40-79 0-9g/day 0 27 1
6 25-34 40-79 10-19 0 7 2
8 25-34 40-79 30+ 0 7 3
7 25-34 40-79 20-29 0 4 4
9 25-34 80-119 0-9g/day 0 2 1
11 25-34 80-119 30+ 0 2 2
...
How do I calculate this column?
Thanks!
This can be approached using either specialized packages such as dplyr
which has row_number()
. We need to group by the variable ('alcgp') and create a new column using mutate
.
library(dplyr)
df1 %>%
group_by( alcgp) %>%
mutate(indx= row_number())
Or using ave
from base R
. We group by 'alcgp' and in the FUN
we can specify seq_along
. I used seq_along(alcgp)
as it may not work if the variable is factor
class.
df1$indx <- with(df1, ave(seq_along(alcgp), alcgp, FUN=seq_along))
Another convenient function in splitstackshape
i.e. getanID
library(splitstackshape)
getanID(df1, 'alcgp')
Here's a data.table
solution:
R> (data.table(Data)[,index := 1:.N, by = "agegp,alcgp"])
agegp alcgp tobgp ncases ncontrols index
1: 25-34 0-39g/day 0-9g/day 0 40 1
2: 25-34 0-39g/day 10-19 0 10 2
3: 25-34 0-39g/day 20-29 0 6 3
4: 25-34 0-39g/day 30+ 0 5 4
5: 25-34 40-79 0-9g/day 0 27 1
6: 25-34 40-79 10-19 0 7 2
7: 25-34 40-79 30+ 0 7 3
8: 25-34 40-79 20-29 0 4 4
9: 25-34 80-119 0-9g/day 0 2 1
10: 25-34 80-119 30+ 0 2 2
library(data.table)
##
Data <- read.table(
text = " agegp alcgp tobgp ncases ncontrols
1 25-34 0-39g/day 0-9g/day 0 40
2 25-34 0-39g/day 10-19 0 10
3 25-34 0-39g/day 20-29 0 6
4 25-34 0-39g/day 30+ 0 5
5 25-34 40-79 0-9g/day 0 27
6 25-34 40-79 10-19 0 7
8 25-34 40-79 30+ 0 7
7 25-34 40-79 20-29 0 4
9 25-34 80-119 0-9g/day 0 2
11 25-34 80-119 30+ 0 2",
header = TRUE,
stringsAsFactors = FALSE
)
来源:https://stackoverflow.com/questions/32118727/count-the-number-previous-items-in-by-group-in-r