I have a \'long-form\' data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values
I would use SQL, in R it is available with the sqldf Package.
Extract all possible combinations something like:
sqldf("select distinct df1.featureCode, df2.featureCode
from df df1, df df2
")
Then you can extract the result elements:
(Maybe just use a for loop for all combinations)
PCLI - PPLC
sqldf("select count(df1.id)
from df df1, df df2
where df1.id = df2.id
and df1.featureCode = 'PCLI' and df2.featureCode = 'PPLC'
")
PPLC - PPL
sqldf("select count(df1.id)
from df df1, df df2
where df1.id = df2.id
and df1.featureCode = 'PPLC' and df2.featureCode = 'PPL'
")
PCLI - PPL
sqldf("select count(df1.id)
from df df1, df df2
where df1.id = df2.id
and df1.featureCode = 'PCLI' and df2.featureCode = 'PPL'
")
There is for sure some easier solution out there especially if you got more combinations to consider. Maybe a search for contingency table helps you out.