Select the most common value of a column based on matched pairs from two columns using `ddply`

喜你入骨 提交于 2019-12-11 07:33:38

问题


I'm trying to use ddply (a plyr function) to sort and identify the most frequent interaction type between any unique pairs of user from a social media data of the following form

from <- c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D')
to <- c('B', 'B', 'D', 'A', 'C', 'C', 'D', 'A', 'D', 'B', 'A', 'B', 'B', 'A', 'C')
interaction_type <- c('like', 'comment', 'share', 'like', 'like', 'like', 'comment', 'like', 'like', 'share', 'like', 'comment', 'like', 'share', 'like')

dat <- data.frame(from, to, interaction_type)

which, if aggregate correctly, should find the most common type of interaction between any unique pairs (regardless of directionality (i.e., A-->B, A<--B)) like this

from    to  type
A       B   like
A       C   like
A       D   share
B       C   like
B       D   comment
C       D   like

While it's easy to get the total count of interaction between any two users by using

count <- ddply(sub_test, .(from, to), nrow)

I found it hard to apply similar method to find the most common type of interaction between any given pairs with this aggregation method. What will be the most efficient way to achieve my desired output? Also, how to handle possible "tied" cases? (I might just use "tided" as the cell values for all tied cases).


回答1:


Similar to Ronak's approach

library(dplyr)
dat <- data.frame(from, to, interaction_type, stringsAsFactors = F)
dat %>% 
  mutate(
    pair = purrr::pmap_chr(
      .l = list(from = from, to = to),
      .f = function(from, to) paste(sort(c(from, to)), collapse = "")
    )
  ) %>%
  group_by(pair) %>%
  filter(n() == max(n()) & row_number() == 1) %>%
  ungroup() %>%
  select(-pair)
# A tibble: 6 x 3
  from  to    interaction_type
  <chr> <chr> <chr>           
1 A     B     like            
2 A     D     share           
3 B     C     like            
4 B     D     comment         
5 C     A     like            
6 C     D     like



回答2:


We need to find the most common value (mode) per group irrespective of order of columns from, to.

Taking the Mode function from this answer

Mode <- function(x) {
   ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}

We can use dplyr to get first appearing maximum value for the group.

library(dplyr)

dat %>%
  mutate(key = paste0(pmin(from, to), pmax(from, to), sep = "")) %>%
  group_by(key) %>%
  mutate(interaction_type = Mode(interaction_type)) %>%
  slice(1) %>%
  ungroup() %>%
  select(-key)

#  from  to    interaction_type
#  <chr> <chr> <chr>           
#1 A     B     like            
#2 C     A     like            
#3 A     D     share           
#4 B     C     like            
#5 B     D     comment         
#6 C     D     like     

Kept columns as characters by adding stringsAsFactors = FALSE in your data.



来源:https://stackoverflow.com/questions/55645739/select-the-most-common-value-of-a-column-based-on-matched-pairs-from-two-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!