Using pmap to apply different regular expressions to different variables in a tibble?

天涯浪子 提交于 2019-12-25 04:13:51

问题


I'm trying to apply different regular expressions to different variables in a tibble. For example, I've made a tibble listing 1) the variable name I want to modify, 2) the regex I want to match, and 3) the replacement string. I'd like to apply the regex/replacement to the variable in a different data frame.

So my "configuration" tibble looks like this:

test_config <-  dplyr::tibble(
  string_col = c("col1", "col2", "col3", "col4"),
  pattern = c("^\\.$", "^NA$", "^NULL$", "^$"),
  replacement = c("","","", "")
)

I'd like to apply this to a target tibble:

test_target <- dplyr::tibble(
  col1 = c("Foo", "bar", ".", "NA", "NULL"),
  col2 = c("Foo", "bar", ".", "NA", "NULL"),
  col3 = c("Foo", "bar", ".", "NA", "NULL"),
  col4 = c("NULL", "NA", "Foo", ".", "bar")
)

So the goal is to replace a different string with an empty string in each column/variable of the test_target.

The result should be like this:

result <- dplyr::tibble(
  col1 = c("Foo", "bar", "", "NA", "NULL"),
  col2 = c("Foo", "bar", ".", "", "NULL"),
  col3 = c("Foo", "bar", ".", "NA", ""),
  col4 = c("NULL", "NA", "Foo", ".", "bar")
)

I can do what I want with a for loop, like this:

for (i in seq(nrow(test_config))) {
  test_target <- dplyr::mutate_at(test_target,
                   .vars = dplyr::vars(
                     tidyselect::matches(test_config$string_col[[i]])),
                   .funs = dplyr::funs(
                     stringr::str_replace_all(
                       ., test_config$pattern[[i]], 
                       test_config$replacement[[i]]))
  )
}

Instead, is there a more tidy way to do what I want? So far, thinking that purrr::pmap was the tool for the job, I've made a function that takes a data frame, variable name, regular expression, and replacement value and returns the data frame with a single variable modified. It behaves as expected:

testFun <- function(df, colName, regex, repVal){
  colName <- dplyr::enquo(colName)
  df <- dplyr::mutate_at(df,
                         .vars = dplyr::vars(
                           tidyselect::matches(!!colName)),
                         .funs = dplyr::funs(
                           stringr::str_replace_all(., regex, repVal))
  )
}

# try with example
out <- testFun(test_target, 
               test_config$string_col[[1]], 
               test_config$pattern[[1]], 
               "")

However, when I try to use that function with pmap, I run into a couple problems: 1) is there a better way to build the list for the pmap call than this?

purrr::pmap(
    list(test_target, 
         test_config$string_col, 
         test_config$pattern, 
         test_config$replacement),
    testFun
)

2) When I call pmap, I get an error:

Error in UseMethod("tbl_vars") : 
  no applicable method for 'tbl_vars' applied to an object of class "character"
Called from: tbl_vars(tbl)

Can any of you suggest a way to use pmap to do what I want, or is there a different or better tidyverse approach to the problem?

Thanks!


回答1:


You don't need to create a function (your function is actually the source of the problem): you can use str_replace_all directly.

pmap_dfr(
  list(test_target,
       test_config$pattern,
       test_config$replacement),
  str_replace_all
)

# A tibble: 5 x 4
  col1  col2  col3  col4 
  <chr> <chr> <chr> <chr>
1 Foo   Foo   Foo   NULL 
2 bar   bar   bar   NA   
3 ""    .     .     Foo  
4 NA    ""    NA    .    
5 NULL  NULL  ""    bar  



回答2:


Another method using map2_dfc (the _dfc suffix is also available for pmap):

library(dplyr)
library(purrr)

map2_dfc(test_target, seq_along(test_target), 
         ~sub(test_config$pattern[.y], 
              test_config$replacement[.y], .x))

or imap_dfc (note that with this, you lose the column names):

imap_dfc(unname(test_target), 
         ~sub(test_config$pattern[.y], 
              test_config$replacement[.y], .x))

Output:

# A tibble: 5 x 4
  col1  col2  col3  col4 
  <chr> <chr> <chr> <chr>
1 Foo   Foo   Foo   NULL 
2 bar   bar   bar   NA   
3 ""    .     .     Foo  
4 NA    ""    NA    .    
5 NULL  NULL  ""    bar 


来源:https://stackoverflow.com/questions/53070606/using-pmap-to-apply-different-regular-expressions-to-different-variables-in-a-ti

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!