How to calculate a table of pairwise counts from long-form data frame

前端 未结 4 1387
闹比i
闹比i 2020-12-06 20:20

I have a \'long-form\' data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values

4条回答
  •  臣服心动
    2020-12-06 21:20

    I would use SQL, in R it is available with the sqldf Package.

    Extract all possible combinations something like:

    sqldf("select distinct df1.featureCode, df2.featureCode
           from df df1, df df2       
           ")
    

    Then you can extract the result elements:
    (Maybe just use a for loop for all combinations)

    PCLI - PPLC

    sqldf("select count(df1.id)
           from df df1, df df2
           where df1.id = df2.id
           and df1.featureCode = 'PCLI' and df2.featureCode = 'PPLC'
           ")
    

    PPLC - PPL

    sqldf("select count(df1.id)
           from df df1, df df2
           where df1.id = df2.id
           and df1.featureCode = 'PPLC' and df2.featureCode = 'PPL'
           ")
    

    PCLI - PPL

    sqldf("select count(df1.id)
           from df df1, df df2
           where df1.id = df2.id
           and df1.featureCode = 'PCLI' and df2.featureCode = 'PPL'
           ")
    

    There is for sure some easier solution out there especially if you got more combinations to consider. Maybe a search for contingency table helps you out.

提交回复
热议问题