R: Split unbalanced list in data.frame column

后端 未结 2 1632

Suppose you have a data frame with the following structure:

df <- data.frame(a=c(1,2,3,4), b=c(\"job1;job2\", \"job1a\", \"job4;job5;job6\", \"job9;job10;         


        
相关标签:
2条回答
  • 2020-11-30 15:09
    #Split by ; as before
    allJobs <- strsplit(df$b, ";", fixed=TRUE)
    
    #Replicate a by the number of jobs in each case
    n <- sapply(allJobs, length)
    id <- rep(df$a, times = n)
    
    #Turn allJobs into a vector
    job <- unlist(allJobs)
    
    #Retrieve position of each job
    jobNum <- unlist(lapply(n, seq_len))
    
    #Combine into a data frame
    df2 <- data.frame(id = id, job = job, jobNum = jobNum)
    
    0 讨论(0)
  • 2020-11-30 15:18

    cSplit from my "splitstacksahpe" package is designed to handle this sort of data manipulation.

    Here it is in action on this question:

    df <- data.frame(a=c(1,2,3,4), 
                   b=c("job1;job2", "job1a", "job4;job5;job6", "job9;job10;job11"))
    
    # install.packages("splitstackshape")
    library(splitstackshape)
    cSplit(df, "b", ";", "long", makeEqual = FALSE)
    #    a b_new
    # 1: 1  job1
    # 2: 1  job2
    # 3: 2 job1a
    # 4: 3  job4
    # 5: 3  job5
    # 6: 3  job6
    # 7: 4  job9
    # 8: 4 job10
    # 9: 4 job11
    

    You can also use strsplit within "dplyr", and then follow up with unnest from "tidyr", like this:

    library(dplyr)
    library(tidyr)
    df %>% 
      mutate(b = strsplit(as.character(b), ";", fixed = TRUE)) %>% 
      unnest(b)
    #   a     b
    # 1 1  job1
    # 2 1  job2
    # 3 2 job1a
    # 4 3  job4
    # 5 3  job5
    # 6 3  job6
    # 7 4  job9
    # 8 4 job10
    # 9 4 job11
    
    0 讨论(0)
提交回复
热议问题