str_replace_all replacing named vector elements iteratively not all at once

后端 未结 4 2036
误落风尘
误落风尘 2021-01-18 09:15

Let\'s say I have a long character string: pneumonoultramicroscopicsilicovolcanoconiosis. I\'d like to use stringr::str_replace_all to replace certain letters w

相关标签:
4条回答
  • 2021-01-18 09:30

    I'm working on a package to deal with the type of problem. This is safer than the qdap::mgsub function because it does not rely on placeholders. It fully supports regex as the matching and the replacement. You provide a named list where the names are the strings to match on and their value is the replacement.

    devtools::install_github("bmewing/mgsub")
    library(mgsub)
    mgsub("developer",list("e" ="p", "p" = "e"))
    #> [1] "dpvploepr"
    
    qdap::mgsub(c("e","p"),c("p","e"),"developer")
    #> [1] "dpvploppr"
    
    0 讨论(0)
  • 2021-01-18 09:34

    There is probably an order in what the function does, so after replacing all c by s, you replace all s by c, only c remains .. try this :

    long_string %>% str_replace_all(c(c ="X", s = "U"))  %>% str_replace_all(c(X ="s", U = "c"))
    
    0 讨论(0)
  • 2021-01-18 09:35

    My workaround would be to take advantage of the fact that str_replace_all can take functions as an input for the replacement.

    library(stringr)
    text_string = "developer"
    pattern <- "p|e"
    fun <- function(query) {
        if(query == "e") y <- "p"
        if(query == "p") y <- "e"
        return(y)
    }
    
    str_replace_all(text_string, pattern, fun)
    

    Of course, if you need to scale up, I would suggest to use a more sophisticated function.

    0 讨论(0)
  • 2021-01-18 09:41

    The iterative behavior is intended. That said, we can use write our own workaround. I am going to use character subsetting for the replacement.

    In a named vector, we can look up things by name and get a replacement value for each name. This is like doing all the replacement simultaneously.

    rules <- c(a = "X", b = "Y", X = "a")
    chars <- c("a", "a", "b", "X", "X")
    rules[chars]
    #>   a   a   b   X   X 
    #> "X" "X" "Y" "a" "a"
    

    So here, looking up "a" in the rules vector gets us "X", effectively replacing "a" with "X". The same goes for the other characters.

    One problem is that names without a match yield NA.

    rules <- c(a = "X", b = "Y", X = "a")
    chars <- c("a", "Y", "Z")
    rules[chars]
    #>    a <NA> <NA> 
    #>  "X"   NA   NA
    

    To prevent the NAs from appearing, we can expand the rules to include any new characters so that a character is replaced by itself.

    rules <- c(a = "X", b = "Y", X = "a")
    chars <- c("a", "Y", "Z")
    no_rule <- chars[! chars %in% names(rules)]
    rules2 <- c(rules, setNames(no_rule, no_rule))
    rules2[chars]
    #>   a   Y   Z 
    #> "X" "Y" "Z"
    

    And that's the logic behind the following function.

    • Break strings to characters
    • Create a full list of replacement rules
    • Look up replacement values
    • Glue strings back together
    library(stringr)
    
    str_replace_chars <- function(string, rules) {
      # Expand rules to replace characters with themselves 
      # if those characters do not have a replacement rule
      chars <- unique(unlist(strsplit(string, "")))
      complete_rules <- setNames(chars, chars)
      complete_rules[names(rules)] <- rules
    
      # Split each string into characters, replace and unsplit
      for (string_i in seq_along(string)) {
        chars_i <- unlist(strsplit(string[string_i], ""))
        string[string_i] <- paste0(complete_rules[chars_i], collapse = "")
      }
      string
    }
    
    rules <- c(a = "X", p = "e", e = "p")
    string <- c("application", "developer")
    str_replace_chars(string, rules)
    #> [1] "XeelicXtion" "dpvploepr"
    
    0 讨论(0)
提交回复
热议问题