I have a \'long-form\' data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values
If you don't need that exact structure, but just need to get the pairwise counts, you can try this approach:
Here's your data:
dat <- read.table(header = TRUE,
text = "id featureCode
5 PPLC
5 PCLI
6 PPLC
6 PCLI
7 PPL
7 PPLC
7 PCLI
8 PPLC
9 PPLC
10 PPLC")
We're only interested in ids where there is more than one featureCode:
dat2 <- dat[ave(dat$id, dat$id, FUN=length) > 1, ]
Having this data as a list is going to be useful since it will let us use lapply to get the pairwise combinations.
dat2 <- split(dat2$featureCode, dat2$id)
This next step can be broken down into its intermediate sections if you prefer, but the basic idea is to create combinations of the vectors in each list item and then tabulate the unlisted output.
table(unlist(lapply(dat2, function(x)
combn(sort(x), 2, FUN = function(y)
paste(y, collapse = "+")))))
#
# PCLI+PPL PCLI+PPLC PPL+PPLC
# 1 3 1
With a little bit of modification, @flodel's answer to another question is applicable here. It requires the igraph package to be installed (install.packages("igraph")).
dat2 <- dat[ave(dat$id, dat$id, FUN=length) > 1, ]
dat2 <- split(dat2$featureCode, dat2$id)
library(igraph)
g <- graph.edgelist(matrix(unlist(lapply(dat2, function(x)
combn(as.character(x), 2, simplify = FALSE))), ncol = 2, byrow=TRUE),
directed=FALSE)
get.adjacency(g)
# 3 x 3 sparse Matrix of class "dgCMatrix"
# PPLC PCLI PPL
# PPLC . 3 1
# PCLI 3 . 1
# PPL 1 1 .