For each row return the column name of the largest value

前端 未结 8 2414
礼貌的吻别
礼貌的吻别 2020-11-21 07:06

I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier

8条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-21 07:45

    A dplyr solution:

    Idea:

    • add rowids as a column
    • reshape to long format
    • filter for max in each group

    Code:

    DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
    DF %>% 
      rownames_to_column() %>%
      gather(column, value, -rowname) %>%
      group_by(rowname) %>% 
      filter(rank(-value) == 1) 
    

    Result:

    # A tibble: 3 x 3
    # Groups:   rowname [3]
      rowname column value
           
    1 2       V1         8
    2 3       V2         5
    3 1       V3         9
    

    This approach can be easily extended to get the top n columns. Example for n=2:

    DF %>% 
      rownames_to_column() %>%
      gather(column, value, -rowname) %>%
      group_by(rowname) %>% 
      mutate(rk = rank(-value)) %>%
      filter(rk <= 2) %>% 
      arrange(rowname, rk) 
    

    Result:

    # A tibble: 6 x 4
    # Groups:   rowname [3]
      rowname column value    rk
            
    1 1       V3         9     1
    2 1       V2         7     2
    3 2       V1         8     1
    4 2       V3         6     2
    5 3       V2         5     1
    6 3       V3         4     2
    

提交回复
热议问题