R - add column that counts sequentially within groups but repeats for duplicates

前端 未结 3 1041
刺人心
刺人心 2020-12-01 22:49

I\'m looking for a solution to add the column \"desired_result\" preferably using dplyr and/or ave(). See the data frame here, where the group is \"section\" and the unique

相关标签:
3条回答
  • 2020-12-01 23:17

    If exact enumeration is necessary and you need the desired result to be consistent (so that a same exhibit in a different section will always have the same number), you can try:

    library(dplyr)
    df <- data.frame(section = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                     exhibit = c('a', 'b', 'c', 'c', 'a', 'b', 'b', 'c'))
    if (is.null(saveLevels <- levels(df$exhibit)))
        saveLevels <- sort(unique(df$exhibit)) ## or levels(factor(df$exhibit))
    df %>%
        group_by(section) %>%
        mutate(answer = as.integer(factor(exhibit, levels = saveLevels)))
    ## Source: local data frame [8 x 3]
    ## Groups: section
    ##   section exhibit answer
    ## 1       1       a      1
    ## 2       1       b      2
    ## 3       1       c      3
    ## 4       1       c      3
    ## 5       2       a      1
    ## 6       2       b      2
    ## 7       2       b      2
    ## 8       2       c      3
    

    If/when a new exhibit appears in subsequent sections, they should get newly enumerated results. (Notice the last exhibit is different.)

    df2 <- data.frame(section = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                      exhibit = c('a', 'b', 'c', 'c', 'a', 'b', 'b', 'd'))
    if (is.null(saveLevels2 <- levels(df2$exhibit)))
        saveLevels2 <- sort(unique(df2$exhibit))
    df2 %>%
        group_by(section) %>%
        mutate(answer = as.integer(factor(exhibit, levels = saveLevels2)))
    ## Source: local data frame [8 x 3]
    ## Groups: section
    ##   section exhibit answer
    ## 1       1       a      1
    ## 2       1       b      2
    ## 3       1       c      3
    ## 4       1       c      3
    ## 5       2       a      1
    ## 6       2       b      2
    ## 7       2       b      2
    ## 8       2       d      4
    
    0 讨论(0)
  • 2020-12-01 23:26

    dense_rank it is

    library(dplyr)
    df %>% 
      group_by(section) %>% 
      mutate(desire=dense_rank(exhibit))
    #  section exhibit desired_result desire
    #1       1       a              1      1
    #2       1       b              2      2
    #3       1       c              3      3
    #4       1       c              3      3
    #5       2       a              1      1
    #6       2       b              2      2
    #7       2       b              2      2
    #8       2       c              3      3
    
    0 讨论(0)
  • 2020-12-01 23:30

    I've recently pushed a function rleid() to data.table (currently available on the development version, 1.9.5), which does exactly this. If you're interested, you can install it by following this.

    require(data.table) # 1.9.5, for `rleid()`
    require(dplyr)
    DF %>% 
      group_by(section) %>% 
      mutate(desired_results=rleid(exhibit))
    
    #   section exhibit desired_result desired_results
    # 1       1       a              1               1
    # 2       1       b              2               2
    # 3       1       c              3               3
    # 4       1       c              3               3
    # 5       2       a              1               1
    # 6       2       b              2               2
    # 7       2       b              2               2
    # 8       2       c              3               3
    
    0 讨论(0)
提交回复
热议问题