build word co-occurence edge list in R

前端 未结 3 512
南方客
南方客 2020-12-16 06:35

I have a chunk of sentences and I want to build the undirected edge list of word co-occurrence and see the frequency of every edge. I took a look at the tm pack

3条回答
  •  失恋的感觉
    2020-12-16 06:44

    Here's a base R way:

    d <- read.table(text='sentence_id text
    1           "a b c d e"
    2           "a b b e"
    3           "b c d"
    4           "a e"', header=TRUE, as.is=TRUE)
    
    result.vec <- table(unlist(lapply(d$text, function(text) {
        pairs <- combn(unique(scan(text=text, what='', sep=' ')), m=2)
        interaction(pairs[1,], pairs[2,])
    })))
    # a.b b.b c.b d.b a.c b.c c.c d.c a.d b.d c.d d.d a.e b.e c.e d.e 
    #   2   0   0   0   1   2   0   0   1   2   2   0   3   2   1   1 
    
    result <- subset(data.frame(do.call(rbind, strsplit(names(result.vec), '\\.')), freq=as.vector(result.vec)), freq > 0)
    with(result, result[order(X1, X2),])
    
    #    X1 X2 freq
    # 1   a  b    2
    # 5   a  c    1
    # 9   a  d    1
    # 13  a  e    3
    # 6   b  c    2
    # 10  b  d    2
    # 14  b  e    2
    # 11  c  d    2
    # 15  c  e    1
    # 16  d  e    1
    

提交回复
热议问题