How to calculate a table of pairwise counts from long-form data frame

前端 未结 4 1390
闹比i
闹比i 2020-12-06 20:20

I have a \'long-form\' data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values

4条回答
  •  被撕碎了的回忆
    2020-12-06 21:14

    Another solution, which is conceptually easy to follow, I think. You have a bipartite graph here, and simply need the projection of this graph onto the "featureCode" vertices. Here is how to do this with the igraph package:

    dat <- read.table(header = TRUE, stringsAsFactors=FALSE,
                      text = "id  featureCode                                       
                              5         PPLC                                                  
                              5         PCLI                                                  
                              6         PPLC                                                  
                              6         PCLI                                                  
                              7          PPL                                                  
                              7         PPLC                                                  
                              7         PCLI                                                  
                              8         PPLC                                                  
                              9         PPLC                                                  
                             10         PPLC")
    
    g <- graph.data.frame(dat, vertices=unique(data.frame(c(dat[,1], dat[,2]),
                              type=rep(c(TRUE, FALSE), each=nrow(dat)))))
    
    get.adjacency(bipartite.projection(g)[[1]], attr="weight", sparse=FALSE)
    
    #      PPLC PCLI PPL
    # PPLC    0    3   1
    # PCLI    3    0   1
    # PPL     1    1   0
    

提交回复
热议问题