Define:
df1 <-data.frame(
id=c(rep(1,3),rep(2,3)),
v1=as.character(c(\"a\",\"b\",\"b\",rep(\"c\",3)))
)
s.t.
> df1
mode <- function(x) names(table(x))[ which.max(table(x)) ]
df1$freq <- ave(df1$v1, df1$id, FUN=mode)
> df1
id v1 freq
1 1 a b
2 1 b b
3 1 b b
4 2 c c
5 2 c c
6 2 c c
Another way consists of using tidyverse
functions:
group_by()
, and counting the occurrence of the second variable using tally()
arrange()
summarize()
and first()
Therefore:
df1 %>%
group_by(id, v1) %>%
tally() %>%
arrange(id, desc(n)) %>%
summarize(freq = first(v1))
This will give you just the mapping (which I find cleaner):
# A tibble: 2 x 2
id freq
<dbl> <fctr>
1 1 b
2 2 c
You can then left_join
your original data frame with that table.
You can do this using ddply
and a custom function to pick out the most frequent value:
myFun <- function(x){
tbl <- table(x$v1)
x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x))
x
}
ddply(df1,.(id),.fun=myFun)
Note that which.max
will return the first occurrence of the maximum value, in the case of ties. See ??which.is.max in the nnet
package for an option that breaks ties randomly.