Create a variable capturing the most frequent occurence by group

前端 未结 3 753
不思量自难忘°
不思量自难忘° 2020-12-10 07:35

Define:

df1 <-data.frame(
id=c(rep(1,3),rep(2,3)),
v1=as.character(c(\"a\",\"b\",\"b\",rep(\"c\",3)))
)

s.t.

> df1
           


        
相关标签:
3条回答
  • 2020-12-10 07:57
    mode <- function(x) names(table(x))[ which.max(table(x)) ]
    df1$freq <- ave(df1$v1, df1$id, FUN=mode)
    > df1
      id v1 freq
    1  1  a    b
    2  1  b    b
    3  1  b    b
    4  2  c    c
    5  2  c    c
    6  2  c    c
    
    0 讨论(0)
  • 2020-12-10 07:57

    Another way consists of using tidyverse functions:

    • grouping first, using group_by(), and counting the occurrence of the second variable using tally()
    • arranging by the number of occurrences with arrange()
    • summarizing and picking out the first row with summarize() and first()

    Therefore:

    df1 %>%
    group_by(id, v1) %>%
    tally() %>%
    arrange(id, desc(n)) %>%
    summarize(freq = first(v1))
    

    This will give you just the mapping (which I find cleaner):

    # A tibble: 2 x 2
         id   freq
      <dbl> <fctr>
    1     1      b
    2     2      c
    

    You can then left_join your original data frame with that table.

    0 讨论(0)
  • 2020-12-10 08:00

    You can do this using ddply and a custom function to pick out the most frequent value:

    myFun <- function(x){
        tbl <- table(x$v1)
        x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x))
        x
    }
    
    ddply(df1,.(id),.fun=myFun)
    

    Note that which.max will return the first occurrence of the maximum value, in the case of ties. See ??which.is.max in the nnet package for an option that breaks ties randomly.

    0 讨论(0)
提交回复
热议问题