How do I load a graph in neighborhood list format?

南笙酒味 提交于 2019-12-13 07:43:33

问题


I have a file of neighborhood lists describing a directed graph:

1 2 5
2 4

which is equivalent to the edge list format:

1 2
1 5
2 4

How do I load it into igraph?

I can use read.lines and strsplit but I have a feeling that this has been done before by someone else.


回答1:


If you are open to using a package still in development, I would suggest exploring the "iotools" package. It's file reader is fast (think along the lines of fread from "data.table") and it includes some splitting features. Use it in conjunction with cSplit from my "splitstackshape" package.

Here's a reproducible example with 1M rows:

First, a function to make some sample data:

data.maker <- function(size) {
  set.seed(1)
  lapply(seq_len(size), function(x) {
    as.character(c(x, sample(100, sample(20), TRUE)))
  })
}

x <- data.maker(1000000)
writeLines(vapply(x, paste, FUN.VALUE = character(1L), collapse = "\t"), "mytest.txt")

Second, load "dplyr" for piping, "iotools" for fast reading, and "splitstackshape" (which also loads "data.table") for splitting and aggregating.

library(dplyr)
library(iotools)
library(splitstackshape)

Here it is, all in one:

system.time({
  out <- input.file("mytest.txt", formatter = mstrsplit, sep = NA, nsep = "\t") %>%
    as.data.table(keep.rownames = TRUE) %>%
    cSplit("V1", "\t", "long") %>%
    .[, .N, by = .(rn, V1)]
})
#    user  system elapsed 
#  26.109   0.096  26.200 

View of the output:

out
#               rn V1 N
#       1:       1 94 1
#       2:       1 22 1
#       3:       1 66 1
#       4:       1 13 1
#       5:       1 27 1
#      ---             
# 9865359: 1000000  1 1
# 9865360: 1000000 85 1
# 9865361: 1000000 91 1
# 9865362: 1000000 44 1
# 9865363: 1000000 20 1
summary(out)
#       rn                  V1              N        
#  Length:9865363     Min.   :  1.0   Min.   :1.000  
#  Class :character   1st Qu.: 25.0   1st Qu.:1.000  
#  Mode  :character   Median : 51.0   Median :1.000  
#                     Mean   : 50.5   Mean   :1.064  
#                     3rd Qu.: 75.0   3rd Qu.:1.000  
#                     Max.   :100.0   Max.   :5.000 

If you prefer more standard packages, you can try the following. It should also be reasonably fast:

library(dplyr)
library(stringi)
library(data.table)


temp <- stri_split_fixed(readLines("mytest.txt"), "\t", n = 2, simplify = TRUE) %>%
  as.data.table %>%
  .[, list(V2 = unlist(strsplit(V2, "\t", TRUE))), by = V1] %>%
  .[, .N, by = .(V1, V2)]


来源:https://stackoverflow.com/questions/29497605/how-do-i-load-a-graph-in-neighborhood-list-format

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!