How to replace many special characters with “something plus special characters” in R

前端 未结 3 851
无人共我
无人共我 2020-12-21 11:27

I have this sentence that contains \"& / ?\".

c = \"Do Sam&Lilly like yes/no questions?\"

I want to add a whitespace before and af

相关标签:
3条回答
  • 2020-12-21 12:05

    Seems like you mean this,

    > c <- "Do Sam&Lilly like yes/no questions?"
    > gsub("([^[:alnum:][:blank:]])", " \\1 ", c)
    [1] "Do Sam & Lilly like yes / no questions ? "
    

    [^[:alnum:][:blank:]] negated POSIX character class which matches any character but not of an alphanumeric or horizontal space character. BY putting the pattern inside a capturing group, it would capture all the special characters. Replacing the matched special chars with space+\\1 (refers the characters which are present inside the first group) + space will give you the desired output. You could use [:space:] instead of [:blank:] also.

    0 讨论(0)
  • 2020-12-21 12:10

    You can use a capture group reference:

    gsub("([&/])", " \\1 ", c)
    

    Here we replace "&" or "/" with themselves ("\\1") padded with spaces. The "\\1" means "use the first matched group from the pattern. A matched group is a portion of a regular expression in parentheses. In our case, the "([&/])".

    You can expand this to cover more symbols / special characters by adding them to the character set, or by putting in an appropriate regex special character.

    note: you probably shouldn't use c as a variable name since it is also the name of a very commonly used function.

    0 讨论(0)
  • 2020-12-21 12:27

    You can build your regex patterns outside of gsub and then pass them in. I see the BrodieG refreed to the pattern enclosed in "(...)"as a "capture group". The material inside square-brackets, "[...]" are called "character classes" in the R-help page for ?regex. The "\1" is a "back-reference" and since the regex-help page seems to be silent on the matter of what to call strings enclosed in parentheses, I've probably just been pushed a bit further along in my understanding of regex terminology. :

    your_chars <- c("!@#$%^&*", "()_+", "?/")
    patt <- paste0( "([", paste0(your_chars,collapse=""), "])", collapse="")
    gsub(patt, " \\1 ", ct)
    #[1] "Do Sam & Lilly like yes / no questions ? "
    

    You would need to use gsub rather than sub if you want to replace more than one instance ins a character value.

    0 讨论(0)
提交回复
热议问题