I am trying to create an edgelist out of patent data of the form:
PatentID InventorIDs CoinventorIDs 1 A ; B C,D,E ; F,G,H,C 2 J ; K ; L M,O ; N ; P, Q
What I would like is the edgelist below showing the connections between inventors and patents. (the semicolons separate the coinventors associated with each primary inventor):
1 A B 1 A C 1 A D 1 A E 1 B F 1 B G 1 B H 1 B C 2 J K 2 J L 2 J M 2 J O 2 K N 2 L P 2 L Q
Is there an easy way to do this with igraph in R?
I'm confused by the edges going between the inventorIds. But, here is a kind of brute force function that you could just apply
by row. There may be a way with igraph
, it being a massive library, that is better, but once you have the data in an this form it should be simple to convert to an igraph
data structure.
Note that this leaves out the edges between primary inventors.
## A function to make the edges for each row rowFunc <- function(row) { tmp <- lapply(row[2:3], strsplit, '\\s*;\\s*') tmp2 <- lapply(tmp[[2]], strsplit, ',') do.call(rbind, mapply(cbind, row[[1]], unlist(tmp[[1]]), unlist(tmp2, recursive=FALSE))) } ## Apply the function by row do.call(rbind, apply(dat, 1, rowFunc)) # [,1] [,2] [,3] # [1,] "1" "A" "C" # [2,] "1" "A" "D" # [3,] "1" "A" "E" # [4,] "1" "B" "F" # [5,] "1" "B" "G" # [6,] "1" "B" "H" # [7,] "1" "B" "C" # [8,] "2" "J" "M" # [9,] "2" "J" "O" # [10,] "2" "K" "N" # [11,] "2" "L" "P" # [12,] "2" "L" " Q"