Extract string before “|” [duplicate]

痞子三分冷 提交于 2019-11-27 06:00:53

问题


I have a data set wherein a column looks like this:

ABC|DEF|GHI,  
ABCD|EFG|HIJK,  
ABCDE|FGHI|JKL,  
DEF|GHIJ|KLM,  
GHI|JKLM|NO|PQRS,  
BCDE|FGHI|JKL  

.... and so on

I need to extract the characters that appear before the first | symbol.

In Excel, we would use a combination of MID-SEARCH or a LEFT-SEARCH, R contains substr().

The syntax is - substr(x, <start>,<stop>)

In my case, start will always be 1. For stop, we need to search by |. How can we achieve this? Are there alternate ways to do this?


回答1:


We can use sub

sub("\\|.*", "", str1)
#[1] "ABC"

Or with strsplit

strsplit(str1, "[|]")[[1]][1]
#[1] "ABC"

Update

If we use the data from @hrbrmstr

sub("\\|.*", "", df$V1)
#[1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE" 

These are all base R methods. No external packages used.

data

str1 <- "ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL"



回答2:


Another option word function of stringr package

library(stringr)
word(df1$V1,1,sep = "\\|")

Data

df1 <- read.table(text = "ABC|DEF|GHI,  
ABCD|EFG|HIJK,  
ABCDE|FGHI|JKL,  
DEF|GHIJ|KLM,  
GHI|JKLM|NO|PQRS,  
BCDE|FGHI|JKL")



回答3:


with stringi:

library(stringi)

df <- read.table(text="ABC|DEF|GHI,1
ABCD|EFG|HIJK,2
ABCDE|FGHI|JKL,3  
DEF|GHIJ|KLM,4
GHI|JKLM|NO|PQRS,5
BCDE|FGHI|JKL,6", sep=",", header=FALSE, stringsAsFactors=FALSE)

stri_match_first_regex(df$V1, "(.*?)\\|")[,2]
## [1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE" 


来源:https://stackoverflow.com/questions/38291794/extract-string-before

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!