build word co-occurence edge list in R

前端未结

关注

 3  512

南方客 2020-12-16 06:35

I have a chunk of sentences and I want to build the undirected edge list of word co-occurrence and see the frequency of every edge. I took a look at the tm pack

3条回答

失恋的感觉 (楼主)

2020-12-16 06:44

Here's a base R way:

d <- read.table(text='sentence_id text
1           "a b c d e"
2           "a b b e"
3           "b c d"
4           "a e"', header=TRUE, as.is=TRUE)

result.vec <- table(unlist(lapply(d$text, function(text) {
    pairs <- combn(unique(scan(text=text, what='', sep=' ')), m=2)
    interaction(pairs[1,], pairs[2,])
})))
# a.b b.b c.b d.b a.c b.c c.c d.c a.d b.d c.d d.d a.e b.e c.e d.e 
#   2   0   0   0   1   2   0   0   1   2   2   0   3   2   1   1 

result <- subset(data.frame(do.call(rbind, strsplit(names(result.vec), '\\.')), freq=as.vector(result.vec)), freq > 0)
with(result, result[order(X1, X2),])

#    X1 X2 freq
# 1   a  b    2
# 5   a  c    1
# 9   a  d    1
# 13  a  e    3
# 6   b  c    2
# 10  b  d    2
# 14  b  e    2
# 11  c  d    2
# 15  c  e    1
# 16  d  e    1

0 讨论(0)

查看其它3个回答