Split a column to multiple columns

前端 未结 2 1563
遇见更好的自我
遇见更好的自我 2020-12-12 00:50

I have table that the first column is:

chr10:100002872-100002872
chr10:100003981-100003981
chr10:100004774-100004774
chr10:100005285-100005285
chr10:10000712         


        
相关标签:
2条回答
  • 2020-12-12 01:29

    Here's one way:

    library(data.table)
    DF[, paste0("V1.",1:3) ] <- tstrsplit(DF$V1, ":|-")
    
    #                          V1  V1.1      V1.2      V1.3
    # 1 chr10:100002872-100002872 chr10 100002872 100002872
    # 2 chr10:100003981-100003981 chr10 100003981 100003981
    # 3 chr10:100004774-100004774 chr10 100004774 100004774
    # 4 chr10:100005285-100005285 chr10 100005285 100005285
    # 5 chr10:100007123-100007123 chr10 100007123 100007123
    

    strsplit accepts regular expressions involving the "or" operator, |, as @AnandaMahto said. tstrsplit is just a convenience function added by the data.table package.

    If you convert your data.frame to a data.table (which has many advantages and no disadvantages except a slight learning curve), you would do:

    setDT(DF)[, paste0("V1.",1:3) := tstrsplit(V1, ":|-")]
    
    #                           V1  V1.1      V1.2      V1.3
    # 1: chr10:100002872-100002872 chr10 100002872 100002872
    # 2: chr10:100003981-100003981 chr10 100003981 100003981
    # 3: chr10:100004774-100004774 chr10 100004774 100004774
    # 4: chr10:100005285-100005285 chr10 100005285 100005285
    # 5: chr10:100007123-100007123 chr10 100007123 100007123
    

    Alternatives. There are (cumbersome) ways to get the same thing in base R, like

    DF[, paste0("V1.",1:3) ] <- do.call(rbind, strsplit(DF$V1, ":|-"))
    

    And @AnandaMahto's package also has a convenience function for this:

    library(splitstackshape)
    cSplit(DF, "V1", ":|-")
    #     V1.1      V1.2      V1.3                      V1_1
    # 1: chr10 100002872 100002872 chr10:100002872-100002872
    # 2: chr10 100003981 100003981 chr10:100003981-100003981
    # 3: chr10 100004774 100004774 chr10:100004774-100004774
    # 4: chr10 100005285 100005285 chr10:100005285-100005285
    # 5: chr10 100007123 100007123 chr10:100007123-100007123
    
    0 讨论(0)
  • 2020-12-12 01:40

    Also similarly with tidyr. If you want to keep the original column you can add , remove = FALSE and convert = TRUE if you want to set corresponding classes to the new columns. separate has a default regex to split on non character/numeric values, thus you don't need to specify your condition. If some of the rows have missing components add , extra = "merge"

    library(tidyr)
    separate(DF, "V1", paste0("V1.",1:3))
    #    V1.1      V1.2      V1.3
    # 1 chr10 100002872 100002872
    # 2 chr10 100003981 100003981
    # 3 chr10 100004774 100004774
    # 4 chr10 100005285 100005285
    # 5 chr10 100007123 100007123
    
    0 讨论(0)
提交回复
热议问题