Separate a column into multiple columns using tidyr::separate with sep=“”

前端 未结 2 1118
甜味超标
甜味超标 2020-12-11 06:10
df <- data.frame(category = c(\"X\", \"Y\"), sequence = c(\"AAT.G\", \"CCG-T\"), stringsAsFactors = FALSE)

df
 category sequence
1        X     AAT.G
2        Y          


        
2条回答
  •  无人及你
    2020-12-11 07:11

    You could do this with extract from tidyr

    library(tidyr)
    extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
    #  category V1 V2 V3 V4 V5
    #1        X  A  A  T  .  G
    #2        Y  C  C  G  -  T
    

    Or create a delimiter with gsub and use that as sep for the separator

    library(dplyr)
    library(tidyr)
    df %>% 
       mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>% 
       separate(sequence, into=paste0('V', 1:5), sep=",")
    #  category V1 V2 V3 V4 V5
    #1        X  A  A  T  .  G
    #2        Y  C  C  G  -  T
    

    Or you can use cSplit

    library(splitstackshape)
    setnames(cSplit(df, 'sequence', '', stripWhite=FALSE),
                 2:6, paste0('V', 1:5))[]
    #   category V1 V2 V3 V4 V5
    #1:        X  A  A  T  .  G
    #2:        Y  C  C  G  -  T
    

提交回复
热议问题