Adding an repeated index for factors in data frame

前端 未结 4 766
慢半拍i
慢半拍i 2020-12-03 20:27

I have a data frame in which I want to add an index e.g. 1...n for each factor in my data frame. Here is an example with some dummy data.

factor
a        
a          


        
相关标签:
4条回答
  • 2020-12-03 20:52

    You could use ave function:

    your_data <- data.frame(
         factor=factor(rep(letters[1:3], times = c(5,5,4)))
    )
    your_data$index <- ave(rep(NA, nrow(your_data)), your_data$factor, FUN=seq_along)
    
    0 讨论(0)
  • 2020-12-03 20:54

    In base R using sequence and table:

    df$index <- sequence(table(df$factor))
    
       # factor index
    # 1       a     1
    # 2       a     2
    # 3       a     3
    # 4       a     4
    # 5       a     5
    # 6       b     1
    # 7       b     2
    # 8       b     3
    # 9       b     4
    # 10      b     5
    # 11      c     1
    # 12      c     2
    # 13      c     3
    # 14      c     4
    

    Data

    df <- data.frame(factor=factor(rep(letters[1:3], times = c(5,5,4))))
    
    0 讨论(0)
  • 2020-12-03 20:55

    One way is:

    unlist(lapply(split(x, x), seq_along))
    

    where x is your factor as a vector.

    R> x <- factor(rep(letters[1:3], times = c(5,5,4))) ## your data
    R> data.frame(factor = x, index = unlist(lapply(split(x, x), seq_along), 
    +             use.names = FALSE))
       factor index
    1       a     1
    2       a     2
    3       a     3
    4       a     4
    5       a     5
    6       b     1
    7       b     2
    8       b     3
    9       b     4
    10      b     5
    11      c     1
    12      c     2
    13      c     3
    14      c     4
    

    Another way, on a similar theme is to use table() and seq_len():

    unlist(sapply(table(x), seq_len), use.names = FALSE)
    

    And another way is to use the run-length encoding via rle():

    R> rle(as.character(x))$lengths
    [1] 5 5 4
    

    which we can plug into the sapply() code instead of the table() call:

    R> unlist(sapply(rle(as.character(x))$lengths, seq_len), use.names = FALSE)
     [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4
    
    0 讨论(0)
  • 2020-12-03 20:56

    Try the following function:

     facSeq <- function(x){
         x.l <-length(x)
         x.f.l <- length(levels(x))
         sapply(1:x.f.l,function(y) cumsum(as.integer(x)%in%y))[1:x.l+x.l*(as.integer(x)-1)]
     }
    

    Testing:

    fac1 <- factor(rep(letters[1:3],each=5))
    
    > data.frame(fac1,index=facSeq(fac1))
       fac1 index
    1     a     1
    2     a     2
    3     a     3
    4     a     4
    5     a     5
    6     b     1
    7     b     2
    8     b     3
    9     b     4
    10    b     5
    11    c     1
    12    c     2
    13    c     3
    14    c     4
    15    c     5
    

    More interesting example:

    fac2 <- factor(sample(letters[1:5],20,replace=T))
    
    > data.frame(fac2,index=facSeq(fac2))
       fac2 index
    1     a     1
    2     a     2
    3     d     1
    4     b     1
    5     a     3
    6     e     1
    7     e     2
    8     a     4
    9     c     1
    10    e     3
    11    b     2
    12    d     2
    13    b     3
    14    e     4
    15    e     5
    16    d     3
    17    c     2
    18    e     6
    19    b     4
    20    d     4
    
    0 讨论(0)
提交回复
热议问题