Remove special characters from data frame

穿精又带淫゛_ 提交于 2019-12-18 12:34:56

问题


I have a matrix that contains the string "Energy per �m". Before the 'm' is a diamond shaped symbol with a question mark in it - I don't know what it is.

I have tried to get rid of it by using this on the column of the matrix:

a=gsub('Energy per �m','',a) 

[and using copy/paste for the first term of gsub], but it does not work.[unexpected symbol in "a=rep(5,Energy per"]. When I try to extract something from the original matrix with grepl I get:

46: In grepl("ref. value", raw$parameter) :
input string 15318 is invalid in this locale

How can I get rid of all this sort of signs? I would like to have only 0-9, A-Z, a-z, / and '. The rest can be zapped.


回答1:


There is probably a better way to do this than with regex (e.g. by changing the Encoding).

But here is your regex solution:

gsub("[^0-9A-Za-z///' ]", "", a)
[1] "Energy per m"

But, as pointed out by @JoshuaUlrich, you're better off to use:

gsub("[^[:alnum:]///' ]", "", x)
[1] "Energy per m"


来源:https://stackoverflow.com/questions/11970891/remove-special-characters-from-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!