I have a list of roughly 100,000 occurrences of items being ordered together that I have pasted into one column so I can count the number of times each combination occurs. <
Your initial approach was pretty close to what I think you want. Combining those into a single factor will definitely work, provided you combine them in the same order, such that you don't end up with "Fries, Burger" and "Burger, Fries."
There may be an easier way of doing what you want, but I'm failing to brain what that is. Nevertheless, I think this does what you're looking for:
# Let's assume your data looks like this:
> df
Var1 Var2 Var3
1 Onion Rings Onion Rings 1
2 Pineapple Cheddar Burger Onion Rings 1
3 Onion Rings Pineapple Cheddar Burger 1
4 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
5 Onion Rings Onion Rings 1
6 Pineapple Cheddar Burger Onion Rings 1
7 Onion Rings Pineapple Cheddar Burger 1
8 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
9 Fountain Soda Fountain Soda 1
10 French Fries Fountain Soda 1
# Now, for each row
# 1. sort the Var1 and Var2,
# 2. combine the sorted vars, and
# 3. convert them back into a factor
df$sortcomb <- as.factor(apply(df[,1:2], 1, function(x) paste(sort(x), collapse=", ")))
table(df$sortcomb) # then use table as per normal
ddply(df, .(sortcomb), summarize, count=length(sortcomb)) # or ddply
The table()
function is helpful here:
with(t1, table(pc)) ## or equivalently table(t1$pc)
This assumes pc
is a factor variable that you want to count occurrences of. (If it isn't a factor it will get coerced to one.)