R: Split unbalanced list in data.frame column

后端 未结 2 1656

Suppose you have a data frame with the following structure:

df <- data.frame(a=c(1,2,3,4), b=c(\"job1;job2\", \"job1a\", \"job4;job5;job6\", \"job9;job10;         


        
2条回答
  •  悲哀的现实
    2020-11-30 15:18

    cSplit from my "splitstacksahpe" package is designed to handle this sort of data manipulation.

    Here it is in action on this question:

    df <- data.frame(a=c(1,2,3,4), 
                   b=c("job1;job2", "job1a", "job4;job5;job6", "job9;job10;job11"))
    
    # install.packages("splitstackshape")
    library(splitstackshape)
    cSplit(df, "b", ";", "long", makeEqual = FALSE)
    #    a b_new
    # 1: 1  job1
    # 2: 1  job2
    # 3: 2 job1a
    # 4: 3  job4
    # 5: 3  job5
    # 6: 3  job6
    # 7: 4  job9
    # 8: 4 job10
    # 9: 4 job11
    

    You can also use strsplit within "dplyr", and then follow up with unnest from "tidyr", like this:

    library(dplyr)
    library(tidyr)
    df %>% 
      mutate(b = strsplit(as.character(b), ";", fixed = TRUE)) %>% 
      unnest(b)
    #   a     b
    # 1 1  job1
    # 2 1  job2
    # 3 2 job1a
    # 4 3  job4
    # 5 3  job5
    # 6 3  job6
    # 7 4  job9
    # 8 4 job10
    # 9 4 job11
    

提交回复
热议问题