问题
I have a column containing values of 3 strings separated by semicolons. I need to just extract the first part of the string.
Type <- c("SNSR_RMIN_PSX150Y_CSH;SP_12;I0.00V50HX0HY3000")
What I want is: Get the first part of the string (till the first semicolon).
Output : SNSR_RMIN_PSX150Y_CSH
I tried gsub but not able to understand. Kindly let me know how we can do this efficiently in R.
回答1:
You could try sub
sub(';.*$','', Type)
#[1] "SNSR_RMIN_PSX150Y_CSH"
It will match the pattern i.e. first occurence of ;
to the end of the string and replace with ''
Or use
library(stringi)
stri_extract(Type, regex='[^;]*')
#[1] "SNSR_RMIN_PSX150Y_CSH"
回答2:
The stringi package works very fast here:
stri_extract_first_regex(Type, "^[^;]+")
## [1] "SNSR_RMIN_PSX150Y_CSH"
I benchmarked on the 3 main approaches here:
Unit: milliseconds
expr min lq mean median uq max neval
SAPPLY() 254.88442 267.79469 294.12715 277.4518 325.91576 419.6435 100
SUB() 182.64996 186.26583 192.99277 188.6128 197.17154 237.9886 100
STRINGI() 89.45826 91.05954 94.11195 91.9424 94.58421 124.4689 100

library(stringi)
SAPPLY <- function() sapply(strsplit(Type, ";"), "[[", 1)
SUB <- function() sub(';.*$','', Type)
STRINGI <- function() stri_extract_first_regex(Type, "^[^;]+")
Type <- c("SNSR_RMIN_PSX150Y_CSH;SP_12;I0.00V50HX0HY3000")
Type <- rep(Type, 100000)
library(microbenchmark)
microbenchmark(
SAPPLY(),
SUB(),
STRINGI(),
times=100L)
回答3:
you can also use strsplit
strsplit(Type, ";")[[1]][1]
[1] "SNSR_RMIN_PSX150Y_CSH"
来源:https://stackoverflow.com/questions/29752250/extract-part-of-string-till-the-first-semicolon-in-r