问题
I have a file of neighborhood lists describing a directed graph:
1 2 5
2 4
which is equivalent to the edge list format:
1 2
1 5
2 4
How do I load it into igraph?
I can use read.lines and strsplit but I have a feeling that this has been done before by someone else.
回答1:
If you are open to using a package still in development, I would suggest exploring the "iotools" package. It's file reader is fast (think along the lines of fread
from "data.table") and it includes some splitting features. Use it in conjunction with cSplit
from my "splitstackshape" package.
Here's a reproducible example with 1M rows:
First, a function to make some sample data:
data.maker <- function(size) {
set.seed(1)
lapply(seq_len(size), function(x) {
as.character(c(x, sample(100, sample(20), TRUE)))
})
}
x <- data.maker(1000000)
writeLines(vapply(x, paste, FUN.VALUE = character(1L), collapse = "\t"), "mytest.txt")
Second, load "dplyr" for piping, "iotools" for fast reading, and "splitstackshape" (which also loads "data.table") for splitting and aggregating.
library(dplyr)
library(iotools)
library(splitstackshape)
Here it is, all in one:
system.time({
out <- input.file("mytest.txt", formatter = mstrsplit, sep = NA, nsep = "\t") %>%
as.data.table(keep.rownames = TRUE) %>%
cSplit("V1", "\t", "long") %>%
.[, .N, by = .(rn, V1)]
})
# user system elapsed
# 26.109 0.096 26.200
View of the output:
out
# rn V1 N
# 1: 1 94 1
# 2: 1 22 1
# 3: 1 66 1
# 4: 1 13 1
# 5: 1 27 1
# ---
# 9865359: 1000000 1 1
# 9865360: 1000000 85 1
# 9865361: 1000000 91 1
# 9865362: 1000000 44 1
# 9865363: 1000000 20 1
summary(out)
# rn V1 N
# Length:9865363 Min. : 1.0 Min. :1.000
# Class :character 1st Qu.: 25.0 1st Qu.:1.000
# Mode :character Median : 51.0 Median :1.000
# Mean : 50.5 Mean :1.064
# 3rd Qu.: 75.0 3rd Qu.:1.000
# Max. :100.0 Max. :5.000
If you prefer more standard packages, you can try the following. It should also be reasonably fast:
library(dplyr)
library(stringi)
library(data.table)
temp <- stri_split_fixed(readLines("mytest.txt"), "\t", n = 2, simplify = TRUE) %>%
as.data.table %>%
.[, list(V2 = unlist(strsplit(V2, "\t", TRUE))), by = V1] %>%
.[, .N, by = .(V1, V2)]
来源:https://stackoverflow.com/questions/29497605/how-do-i-load-a-graph-in-neighborhood-list-format