Extract numeric part of strings of mixed numbers and characters in R

后端 未结 4 598
傲寒
傲寒 2020-11-28 07:22

I have a lot of strings, and each of which tends to have the following format: Ab_Cd-001234.txt I want to replace it with 001234. How can I achieve

相关标签:
4条回答
  • 2020-11-28 08:05

    Using gsub or sub you can do this :

     gsub('.*-([0-9]+).*','\\1','Ab_Cd-001234.txt')
    "001234"
    

    you can use regexpr with regmatches

    m <- gregexpr('[0-9]+','Ab_Cd-001234.txt')
    regmatches('Ab_Cd-001234.txt',m)
    "001234"
    

    EDIT the 2 methods are vectorized and works for a vector of strings.

    x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
    sub('.*-([0-9]+).*','\\1',x)
    "001234" "001234"
    
     m <- gregexpr('[0-9]+',x)
    > regmatches(x,m)
    [[1]]
    [1] "001234"
    
    [[2]]
    [1] "001234"
    
    0 讨论(0)
  • 2020-11-28 08:07

    The stringr package has lots of handy shortcuts for this kind of work:

    # input data following @agstudy
    data <-  c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
    
    # load library
    library(stringr)
    
    # prepare regular expression
    regexp <- "[[:digit:]]+"
    
    # process string
    str_extract(data, regexp)
    
    Which gives the desired result:
    
      [1] "001234" "001234"
    

    To explain the regexp a little:

    [[:digit:]] is any number 0 to 9

    + means the preceding item (in this case, a digit) will be matched one or more times

    This page is also very useful for this kind of string processing: http://en.wikibooks.org/wiki/R_Programming/Text_Processing

    0 讨论(0)
  • 2020-11-28 08:11

    You could use genXtract from the qdap package. This takes a left character string and a right character string and extracts the elements between.

    library(qdap)
    genXtract("Ab_Cd-001234.txt", "-", ".txt")
    

    Though I much prefer agstudy's answer.

    EDIT Extending answer to match agstudy's:

    x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
    genXtract(x, "-", ".txt")
    
    # $`-  :  .txt1`
    # [1] "001234"
    # 
    # $`-  :  .txt2`
    # [1] "001234"
    
    0 讨论(0)
  • 2020-11-28 08:14

    gsub Remove prefix and suffix:

    gsub(".*-|\\.txt$", "", x)
    

    tools package Use file_path_sans_ext from tools to remove extension and then use sub to remove prefix:

    library(tools)
    sub(".*-", "", file_path_sans_ext(x))
    

    strapplyc Extract the digits after - and before dot. See gsubfn home page for more info:

    library(gsubfn)
    strapplyc(x, "-(\\d+)\\.", simplify = TRUE)
    

    Note that if it were desired to return a numeric we could use strapply rather than strapplyc like this:

    strapply(x, "-(\\d+)\\.", as.numeric, simplify = TRUE)
    
    0 讨论(0)
提交回复
热议问题