Split a column of concatenated comma-delimited data and recode output as factors

前端 未结 2 805
-上瘾入骨i
-上瘾入骨i 2020-11-27 08:09

I am trying to clean up some data that has been incorrectly entered. The question for the variable allows for multiple responses out of five choices, numbered as 1 to 5. The

2条回答
  •  离开以前
    2020-11-27 08:36

    A long time later, I finally got around to creating a package ("splitstackshape") that deals with this kind of data in an efficient manner. So, for the convenience of others (and some self-promotion, of course) here's a compact solution.

    The relevant function for this problem is cSplit_e.

    First, the default settings, which retains the original column and uses NA as the fill:

    library(splitstackshape)
    cSplit_e(data, "V1")
    #           V1 V1_1 V1_2 V1_3 V1_4 V1_5
    # 1    1, 2, 3    1    1    1   NA   NA
    # 2    1, 2, 4    1    1   NA    1   NA
    # 3 2, 3, 4, 5   NA    1    1    1    1
    # 4    1, 3, 4    1   NA    1    1   NA
    # 5    1, 3, 5    1   NA    1   NA    1
    # 6 2, 3, 4, 5   NA    1    1    1    1
    

    Second, with dropping the original column and using 0 as the fill.

    cSplit_e(data, "V1", drop = TRUE, fill = 0)
    #   V1_1 V1_2 V1_3 V1_4 V1_5
    # 1    1    1    1    0    0
    # 2    1    1    0    1    0
    # 3    0    1    1    1    1
    # 4    1    0    1    1    0
    # 5    1    0    1    0    1
    # 6    0    1    1    1    1
    

提交回复
热议问题