问题
My file is like this-
Pcol Mcol
P1 M1,M2,M5,M6,M1,M2,M1.M5
P2 M1,M2,M3,M5,M1,M2,M1,M3
P3 M4,M5,M7,M6,M5,M7,M4,M7
I want to find all the combination of Mcol elements and find these combinatinatons are present in how many rows.
Expected output-
Mcol freq
M1,M2 2
M1,M5 2
M1,M6 1
M2,M5 2
M2,M6 1
M5,M6 2
M1,M3 1
M2,M3 1
M4,M5 1
M4,M7 1
M4,M6 1
M7,M6 1
I have tried this-
x <- read.csv("file.csv" ,header = TRUE, stringsAsFactors = FALSE)
xx <- do.call(rbind.data.frame,
lapply(x$Mcol, function(i){
n <- sort(unlist(strsplit(i, ",")))
t(combn(n, 2))
}))
data.frame(table(paste(xx[, 1], xx[, 2], sep = ",")))
It doesn't give the expected output
I have also tried this as well-
library(tidyverse)
df1 %>%
separate_rows(Mcol) %>%
group_by(Pcol) %>%
summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>%
unnest %>%
unnest %>%
count(Mcol)
But it is not giving the frequency of combination that are present in number of rows.I want the frequency of row in which these combinations are present. That means if M1,M2 are present in P1 and P2 so it will calculate the frequency as 2.
回答1:
An option in tidyverse would be to be split the 'Mcol' with separate_row, grouped by 'Pcol', get the combn of 'Mcol' and after unnesting take the count of 'Mcol' column
library(tidyverse)
df1 %>%
separate_rows(Mcol) %>%
group_by(Pcol) %>%
summarise(Mcol = list(combn(Mcol, 2, FUN= toString, simplify = FALSE))) %>%
unnest %>%
unnest %>%
count(Mcol)
# A tibble: 14 x 2
# Mcol n
# <chr> <int>
# 1 M1, M2 2
# 2 M1, M3 1
# 3 M1, M5 2
# 4 M1, M6 1
# 5 M2, M3 1
# 6 M2, M5 2
# 7 M2, M6 1
# 8 M3, M5 1
# 9 M4, M5 1
#10 M4, M6 1
#11 M4, M7 1
#12 M5, M6 2
#13 M5, M7 1
#14 M7, M6 1
来源:https://stackoverflow.com/questions/56794136/find-all-the-combinations-of-a-particular-column-and-find-their-frequencies