I have a data frame in which I want to add an index e.g. 1...n for each factor in my data frame. Here is an example with some dummy data.
factor
a
a
You could use ave
function:
your_data <- data.frame(
factor=factor(rep(letters[1:3], times = c(5,5,4)))
)
your_data$index <- ave(rep(NA, nrow(your_data)), your_data$factor, FUN=seq_along)
In base R using sequence
and table
:
df$index <- sequence(table(df$factor))
# factor index
# 1 a 1
# 2 a 2
# 3 a 3
# 4 a 4
# 5 a 5
# 6 b 1
# 7 b 2
# 8 b 3
# 9 b 4
# 10 b 5
# 11 c 1
# 12 c 2
# 13 c 3
# 14 c 4
Data
df <- data.frame(factor=factor(rep(letters[1:3], times = c(5,5,4))))
One way is:
unlist(lapply(split(x, x), seq_along))
where x
is your factor as a vector.
R> x <- factor(rep(letters[1:3], times = c(5,5,4))) ## your data
R> data.frame(factor = x, index = unlist(lapply(split(x, x), seq_along),
+ use.names = FALSE))
factor index
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 b 1
7 b 2
8 b 3
9 b 4
10 b 5
11 c 1
12 c 2
13 c 3
14 c 4
Another way, on a similar theme is to use table()
and seq_len()
:
unlist(sapply(table(x), seq_len), use.names = FALSE)
And another way is to use the run-length encoding via rle()
:
R> rle(as.character(x))$lengths
[1] 5 5 4
which we can plug into the sapply()
code instead of the table()
call:
R> unlist(sapply(rle(as.character(x))$lengths, seq_len), use.names = FALSE)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4
Try the following function:
facSeq <- function(x){
x.l <-length(x)
x.f.l <- length(levels(x))
sapply(1:x.f.l,function(y) cumsum(as.integer(x)%in%y))[1:x.l+x.l*(as.integer(x)-1)]
}
Testing:
fac1 <- factor(rep(letters[1:3],each=5))
> data.frame(fac1,index=facSeq(fac1))
fac1 index
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 b 1
7 b 2
8 b 3
9 b 4
10 b 5
11 c 1
12 c 2
13 c 3
14 c 4
15 c 5
More interesting example:
fac2 <- factor(sample(letters[1:5],20,replace=T))
> data.frame(fac2,index=facSeq(fac2))
fac2 index
1 a 1
2 a 2
3 d 1
4 b 1
5 a 3
6 e 1
7 e 2
8 a 4
9 c 1
10 e 3
11 b 2
12 d 2
13 b 3
14 e 4
15 e 5
16 d 3
17 c 2
18 e 6
19 b 4
20 d 4