I have a dataframe that looks like this (this is just a subset, actually dataset has 2724098 rows)
> head(dat)
chr start end enhancer motif
chr10
...if this isn't what you want, I'm giving up. Obviously it isn't optimized for a large data set. This is just a general algorithm that takes natural advantage of R. There are several improvements possible, e.g. with dplyr and data.table. The latter will greatly speed up the [ and %in% operations here.
motif_pairs <- combn(unique(dat$motif), 2)
colnames(motif_pairs) <- apply(motif_pairs, 2, paste, collapse = " ")
motif_pair_counts <- apply(motif_pairs, 2, function(motif_pair) {
sum(daply(dat[dat$motif %in% motif_pair, ], .(id), function(dat_subset){
all(motif_pair %in% dat_subset$motif)
}))
})
motif_pair_counts <- as.data.frame(unname(cbind(t(motif_pairs), motif_pair_counts)))
names(motif_pair_counts) <- c("motif1", "motif2", "count")
motif_pair_counts
# motif1 motif2 count
# 1 GATA6 GATA4 3
# 2 GATA6 SRF 2
# 3 GATA6 MEF2A 2
# 4 GATA4 SRF 2
# 5 GATA4 MEF2A 2
# 6 SRF MEF2A 3
Another old version. PLEASE make sure your question is clear!
This is precisely what plyr was designed to accomplish. Try dlply(dat, .(id), function(x) table(x$motif) ).
But please don't just try to copy and paste this solution without at least reading the documentation. plyr is a very powerful package and it will be very helpful for you to understand it.
Old post answering the wrong question:
Are you looking for disjoint or overlapping pairs?
Here's one solution using the function rollapply from package zoo:
library(zoo)
motif_pairs <- rollapply(dat$motif, 2, c) # get a matrix of pairs
motif_pairs <- apply(motif_pairs, 1, function(row) { # for every row...
paste0(sort(row), collapse = " ") # sort the row, and concatenate it to a single string
# (sorting ensures that pairs are not double-counted)
})
table(motif_pairs) # since each pair is now represented by a unique string, just tabulate the string appearances
## if you want disjoint pairs, do `rollapply(dat$motif, 2, c, by = 2)` instead
Take a look at the docs for rollapply if this isn't quite what you need. For grouping by other variables, you can combine this with one of:
aggregate or by (not recommended), or*ply functions from plyr (better)