gsub with “|” character in R

谁说胖子不能爱 提交于 2021-02-17 05:45:31

问题


I have a data frame with strings under a variable with the | character. What I want is to remove anything downstream of the | character.

For example, considering the string

heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding

I wish to have only:

heat-shock protein hsp70, putative

Do I need any escape character for the | character?

If I do:

a <- c("foo_5", "bar_7")
gsub("*_.", "", a)

I get:

[1] "foo" "bar"

i.e. I am removing anything downstream of the _ character.

However, If I repeat the same task with a | instead of the _:

b <- c("foo|5", "bar|7")
gsub("*|.", "", a)

I get:

[1] "" ""

回答1:


You have to scape | by adding \\|. Try this

> gsub("\\|.*$", "", string)
[1] "heat-shock protein hsp70, putative "

where string is

string <- "heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding"

This alternative remove the space at the end of line in the output

 gsub("\\s+\\|.*$", "", string)
[1] "heat-shock protein hsp70, putative"



回答2:


Maybe a better job for strsplit than for a gsub

And yes, it looks like the pipe does need to be escaped.

string <- "heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding"
strsplit(string, ' \\| ')[[1]][1]

That outputs

"heat-shock protein hsp70, putative"

Note that I'm assuming you only want the text from before the first pipe, and that you want to drop the space that separates the pipe from the piece of the string you care about.



来源:https://stackoverflow.com/questions/49303897/gsub-with-character-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!