问题
a <- c(rep(1:2,3))
b <- c("A","A","B","B","B","B")
df <- data.frame(a,b)
> str(b)
chr [1:6] "A" "A" "B" "B" "B" "B"
a b
1 1 A
2 2 A
3 1 B
4 2 B
5 1 B
6 2 B
I want to group by variable a
and return the most frequent value of b
My desired result would look like
a b
1 1 B
2 2 B
In dplyr
it would be something like
df %>% group_by(a) %>% summarize (b = most.frequent(b))
I mentioned dplyr
only to visualize the problem.
回答1:
The key is to start grouping by both a
and b
to compute the frequencies and then take only the most frequent per group of a
, for example like this:
df %>%
count(a, b) %>%
slice(which.max(n))
Source: local data frame [2 x 3]
Groups: a
a b n
1 1 B 2
2 2 B 2
Of course there are other approaches, so this is only one possible "key".
回答2:
by()
each value of a
, create a table()
of b
and extract the names()
of the largest entry in that table()
:
> with(df,by(b,a,function(xx)names(which.max(table(xx)))))
a: 1
[1] "B"
------------------------
a: 2
[1] "B"
You can wrap this in as.table()
to get a prettier output, although it still does not exactly match your desired result:
> as.table(with(df,by(b,a,function(xx)names(which.max(table(xx))))))
a
1 2
B B
回答3:
What works for me or is simpler is:
df %>% group_by(a) %>% slice(which.max(table(b)) )
df %>% group_by(a) %>% count(b) %>% top_n(1)
来源:https://stackoverflow.com/questions/29922195/return-most-frequent-string-value-for-each-group