R semicolon delimited a column into rows

后端 未结 3 1800
独厮守ぢ
独厮守ぢ 2020-11-29 13:43

I am using RStudio 2.15.0 and have created an object from Excel using XLConnect with 3000+ rows and 12 columns I am trying to delimit/split a column into the rows but don\'t

3条回答
  •  Happy的楠姐
    2020-11-29 14:13

    You could try unnest from tidyr after splitting the "PolId" column and get the unique rows

    library(dplyr)
    library(tidyr)
     unnest(setNames(strsplit(df$PolId, ';'), df$Description), 
                                      Description) %>% unique()
    

    Or using base R with stack/strsplit/duplicated. Split the "PolId" (strsplit) by the delimiter(;), name the output list elements with "Description" column, stack the list to get a 'data.frame' and use duplicated to remove the duplicate rows.

    df1 <- stack(setNames(strsplit(df$PolId, ';'), df$Description))
    setNames(df1[!duplicated(df1),], names(df))
    #     PolId Description
    #1  ABC123       TEST1
    #2  ABC456       TEST1
    #3  ABC789       TEST1
    #10 AAA123       TEST1
    #11 AAA123       TEST2
    #12 ABB123       TEST3
    #13 ABC123       TEST3
    

    Or another option without using strsplit

    v1 <- with(df, tapply(PolId, Description, FUN= function(x) {
                x1 <- paste(x, collapse=";")
            gsub('(\\b\\S+\\b)(?=.*\\b\\1\\b.*);', '', x1, perl=TRUE)}))
    library(stringr)
    Description <- rep(names(v1),  str_count(v1, '\\w+'))
    PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
    data.frame(PolId, Description)
    #   PolId Description
    #1 ABC123       TEST1
    #2 ABC456       TEST1
    #3 ABC789       TEST1
    #4 AAA123       TEST1
    #5 AAA123       TEST2
    #6 ABB123       TEST3
    #7 ABC123       TEST3
    

提交回复
热议问题