R: removing numbers at begin and end of a string

孤街浪徒 提交于 2019-12-18 08:48:12

问题


I've got the following vector:

words <- c("5lang","kasverschil2","b2b")

I want to remove "5" in "5lang" and "2" in "kasverschil2". But I do NOT want to remove "2" in "b2b".


回答1:


 gsub("^\\d+|\\d+$", "", words)    
 #[1] "lang"        "kasverschil" "b2b"

Another option would be to use stringi

 library(stringi)
 stri_replace_all_regex(words, "^\\d+|\\d+$", "")
  #[1] "lang"        "kasverschil" "b2b"        

Using a variant of the data set provided by the OP here are benchmarks for 3 three main solutions (note that these strings are very short and contrived; results may differ on a larger, real data set):

words <- rep(c("5lang","kasverschil2","b2b"), 100000)

library(stringi)
library(microbenchmark)

GSUB <- function() gsub("^\\d+|\\d+$", "", words)
STRINGI <- function() stri_replace_all_regex(words, "^\\d+|\\d+$", "")
GREGEXPR <- function() {
    gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
    sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 
}

microbenchmark( 
    GSUB(),
    STRINGI(),
    GREGEXPR(),
    times=100L
)

## Unit: milliseconds
##        expr       min        lq    median        uq       max neval
##      GSUB()  301.0988  349.9952  396.3647  431.6493  632.7568   100
##   STRINGI()  465.9099  513.1570  569.1972  629.4176  738.4414   100
##  GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904   100



回答2:


Get instances where numbers appear at the beginning or end of a word and match everything else. You need to collapse results because of possible multiple matches:

gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 



回答3:


You can use gsub which uses regular expressions:

gsub("^[0-9]|[0-9]$", "", words)
# [1] "lang"        "kasverschil" "b2b"

Explanation:

The pattern ^[0-9] matches any number at the beginning of a string, while the pattern [0-9]$ matches any number at the end of the string. by separating these two patterns by | you want to match either the first or the second pattern. Then, you replace the matched pattern with an empty string.



来源:https://stackoverflow.com/questions/26359316/r-removing-numbers-at-begin-and-end-of-a-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!