split character data into numbers and letters

后端 未结 7 1725
没有蜡笔的小新
没有蜡笔的小新 2020-12-02 15:45

I have a vector of character data. Most of the elements in the vector consist of one or more letters followed by one or more numbers. I wish to split each element in the

相关标签:
7条回答
  • 2020-12-02 16:25

    Late answer, but another option is to use strsplit with a regex pattern which uses lookarounds to find the boundary between numbers and letters:

    var <- "ABC123"
    strsplit(var, "(?=[A-Za-z])(?<=[0-9])|(?=[0-9])(?<=[A-Za-z])", perl=TRUE)
    [[1]]
    [1] "ABC" "123"
    

    The above pattern will match (but not consume) when either the previous character is a letter and the following character is a number, or vice-versa. Note that we use strsplit in Perl mode to access lookarounds.

    Demo

    0 讨论(0)
  • 2020-12-02 16:29

    For your regex you have to use:

    gsub("[[:digit:]]","",my.data)
    

    The [:digit:] character class only makes sense inside a set of [].

    0 讨论(0)
  • 2020-12-02 16:37

    You can also use colsplit from reshape2 to split your vector into character and digit columns in one step:

    library(reshape2)
    
    colsplit(my.data, "(?<=\\p{L})(?=[\\d+$])", c("char", "digit"))
    

    Result:

      char digit
    1  aaa    NA
    2    b    11
    3    b    21
    4    b   101
    5    b   111
    6  ccc     1
    7  ffffd     1
    8  ccc    20
    9  ffffd    13
    

    Data:

    my.data <- c("aaa", "b11", "b21", "b101", "b111", "ccc1", "ffffd1", "ccc20", "ffffd13")
    
    0 讨论(0)
  • 2020-12-02 16:40

    With stringr, if you like (and slightly different from the answer to the other question):

    # load library
    library(stringr)
    #
    # load data
    my.data <- c("aaa", "b11", "b21", "b101", "b111", "ccc1", "ffffd1", "ccc20", "ffffd13")
    #
    # extract numbers only
    my.data.num <- as.numeric(str_extract(my.data, "[0-9]+"))
    #
    # check output
    my.data.num
    [1]  NA  11  21 101 111   1   1  20  13
    #
    # extract characters only
    my.data.cha <- (str_extract(my.data, "[aA-zZ]+"))
    # 
    # check output
    my.data.cha
    [1] "aaa" "b"   "b"   "b"   "b"   "ccc" "ffffd" "ccc" "ffffd"
    
    0 讨论(0)
  • 2020-12-02 16:44
    mydata.nub<-gsub("\ \ D","",my.data)
    
    mydata.text<-gsub("\ \ d","",my.data)
    

    This one is perfect, and it also separates number and text, even if there is number between the text.

    0 讨论(0)
  • 2020-12-02 16:45

    Since none of the previous answers use tidyr::separate here it goes:

    library(tidyr)
    
    df <- data.frame(mycol = c("APPLE348744", "BANANA77845", "OATS2647892", "EGG98586456"))
    
    df %>%
      separate(mycol, 
               into = c("text", "num"), 
               sep = "(?<=[A-Za-z])(?=[0-9])"
               )
    
    0 讨论(0)
提交回复
热议问题