I would like to increment a count that restarts from 1 when a condition in an existing column is met.
For example I have the following data frame:
df
Using base R:
df$x3 <- with(df, ave(x1, cumsum(x2 == 'start'), FUN = seq_along))
gives:
> df
x1 x2 x3
1 10 start 1
2 100 a 2
3 200 b 3
4 300 c 4
5 87 start 1
6 90 k 2
7 45 l 3
8 80 o 4
Or with the dplyr or data.table packages:
library(dplyr)
df %>%
group_by(grp = cumsum(x2 == 'start')) %>%
mutate(x3 = row_number())
library(data.table)
# option 1
setDT(df)[, x3 := rowid(cumsum(x2 == 'start'))][]
# option 2
setDT(df)[, x3 := 1:.N, by = cumsum(x2 == 'start')][]
Here is another base R method:
df$x3 <- sequence(diff(c(which(df$x2 == "start"), nrow(df)+1)))
which returns
df
x1 x2 x3
1 10 start 1
2 100 a 2
3 200 b 3
4 300 c 4
5 87 start 1
6 90 k 2
7 45 l 3
8 80 o 4
sequence takes an integer vector and returns counts from 1 to each of the vector entries. It is fed the length of each count using diff to calculate the difference of the positions of the start of each sequence. Because of this, we have to include the value of the position after the final row of the data.frame, nrow(df)+1.