Separate a column into multiple columns using tidyr::separate with sep=“”

前端 未结 2 1117
甜味超标
甜味超标 2020-12-11 06:10
df <- data.frame(category = c(\"X\", \"Y\"), sequence = c(\"AAT.G\", \"CCG-T\"), stringsAsFactors = FALSE)

df
 category sequence
1        X     AAT.G
2        Y          


        
相关标签:
2条回答
  • 2020-12-11 07:02

    sep can be an integer vector. It would be sufficient to use sep=1:4 but the 5 works too and it looks a bit better.

    df %>% separate(sequence, into = paste0("V", 1:5), sep = 1:5)
    

    giving:

      category V1 V2 V3 V4 V5
    1        X  A  A  T  .  G
    2        Y  C  C  G  -  T
    
    0 讨论(0)
  • 2020-12-11 07:11

    You could do this with extract from tidyr

    library(tidyr)
    extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
    #  category V1 V2 V3 V4 V5
    #1        X  A  A  T  .  G
    #2        Y  C  C  G  -  T
    

    Or create a delimiter with gsub and use that as sep for the separator

    library(dplyr)
    library(tidyr)
    df %>% 
       mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>% 
       separate(sequence, into=paste0('V', 1:5), sep=",")
    #  category V1 V2 V3 V4 V5
    #1        X  A  A  T  .  G
    #2        Y  C  C  G  -  T
    

    Or you can use cSplit

    library(splitstackshape)
    setnames(cSplit(df, 'sequence', '', stripWhite=FALSE),
                 2:6, paste0('V', 1:5))[]
    #   category V1 V2 V3 V4 V5
    #1:        X  A  A  T  .  G
    #2:        Y  C  C  G  -  T
    
    0 讨论(0)
提交回复
热议问题