Extract part of string (till the first semicolon) in R

跟風遠走 提交于 2019-11-27 03:30:30

问题


I have a column containing values of 3 strings separated by semicolons. I need to just extract the first part of the string.

Type <- c("SNSR_RMIN_PSX150Y_CSH;SP_12;I0.00V50HX0HY3000")

What I want is: Get the first part of the string (till the first semicolon).

Output : SNSR_RMIN_PSX150Y_CSH

I tried gsub but not able to understand. Kindly let me know how we can do this efficiently in R.


回答1:


You could try sub

sub(';.*$','', Type)
#[1] "SNSR_RMIN_PSX150Y_CSH"

It will match the pattern i.e. first occurence of ; to the end of the string and replace with ''

Or use

library(stringi)
stri_extract(Type, regex='[^;]*')
#[1] "SNSR_RMIN_PSX150Y_CSH"



回答2:


The stringi package works very fast here:

stri_extract_first_regex(Type, "^[^;]+")
## [1] "SNSR_RMIN_PSX150Y_CSH"

I benchmarked on the 3 main approaches here:

Unit: milliseconds
      expr       min        lq      mean   median        uq      max neval
  SAPPLY() 254.88442 267.79469 294.12715 277.4518 325.91576 419.6435   100
     SUB() 182.64996 186.26583 192.99277 188.6128 197.17154 237.9886   100
 STRINGI()  89.45826  91.05954  94.11195  91.9424  94.58421 124.4689   100

Here's the code for the Benchmarks:
library(stringi)
SAPPLY <- function() sapply(strsplit(Type, ";"), "[[", 1)
SUB <- function() sub(';.*$','', Type)
STRINGI <- function() stri_extract_first_regex(Type, "^[^;]+")

Type <- c("SNSR_RMIN_PSX150Y_CSH;SP_12;I0.00V50HX0HY3000")
Type <- rep(Type, 100000)

library(microbenchmark)
microbenchmark( 
    SAPPLY(),
    SUB(),
    STRINGI(),
times=100L)



回答3:


you can also use strsplit

strsplit(Type, ";")[[1]][1]
[1] "SNSR_RMIN_PSX150Y_CSH"


来源:https://stackoverflow.com/questions/29752250/extract-part-of-string-till-the-first-semicolon-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!