How to calculate a (co-)occurrence matrix from a data frame with several columns using R?

后端 未结 3 1401
时光取名叫无心
时光取名叫无心 2021-01-12 13:57

I\'m a rookie in R and currently working with collaboration data in the form of an edge list with 32 columns and around 200.000 rows. I want to create a (co-)occurrence matr

3条回答
  •  滥情空心
    2021-01-12 14:39

    There may be better ways to do this, but try:

    library(tidyverse)
    
    df1 <- df %>%
    pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
    xtabs(~ID + Country, data = ., sparse = FALSE) %>% 
    crossprod(., .) 
    
    df_diag <- df %>% 
    pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
    mutate(Country2 = Country) %>%
    xtabs(~Country + Country2, data = ., sparse = FALSE) %>% 
    diag()
    
    diag(df1) <- df_diag 
    
    df1
    
    Country   China England Greece USA
      China       2       2      2   0
      England     2       6      1   1
      Greece      2       1      3   1
      USA         0       1      1   1
    

提交回复
热议问题