Using str_detect (or some other function) and some way to loop through a list to essentially perform a vlookup

醉酒当歌 提交于 2020-01-15 10:14:34

问题


I have been searching for a way to do this and some results on here seem similar, nothing seems to be working, nor can I find a method that will loop through a list like a vlookup in excel. I apologize if I have missed it.

I am trying to add a new column to a data set with Mutate. What it is going to do is look at one column using str_replace (or some other function if necessary), and then loop through another list. I want to replace what it finds on with the corresponding value in another column. Essentially a vlookup in excel. It cannot be done in excel however because the file is simply too large.

I can do a simple str_replace one at a time, but there are 502 possible options that I need to choose from, so writing the code for that would take a very long time. Here is what I have so far:

 testVendor <- vendorData %>%
  select(TOUPPER(Addr1) %>%
  mutate('NewAdd' = str_replace(Addr1, 'STREET', 'ST'))

However, rather than me specifying STREET and then ST, I want it to loop through a list of common postal abbreviations and return the standard abbreviation.

An example would be

addr1 <- c('123 MAIN STREET', '123 GARDEN ROAD', '123 CHARLESTON BOULEVARD')
state_abbrv <- c('FL', 'CA', 'NY')
vendor <- data.frame(addr1, state_abbrv)
usps_name <- c('STREET', 'LANE', 'BOULEVARD', 'ROAD', 'TURNPIKE')
usps_abbrv <- c('ST', 'LN', 'BLVD', 'RD', 'TPKE')
usps <- data.frame(usps_name, usps_abbrv)

The ideal output would be a new column on the vendor data frame and would look like this:

Any assistance with this is wonderful, and please allot me to expand on the question if it is unclear of what I am looking for.

Thank you in advance.


回答1:


I would use a for loop:

usps[] = lapply(usps, as.character)
vendor$new_addr1 = as.character(vendor$addr1)

for(i in 1:nrow(usps)) {
  vendor$new_addr1 = str_replace_all(
    vendor$new_addr1, 
    pattern = usps$usps_name[i], 
    replacement = usps$usps_abbrv[i])
}

vendor
#                      addr1 state_abbrv           new_addr1
# 1          123 MAIN STREET          FL         123 MAIN ST
# 2          123 GARDEN ROAD          CA       123 GARDEN RD
# 3 123 CHARLESTON BOULEVARD          NY 123 CHARLESTON BLVD

To be extra safe, I'd add regex word boundaries to your patterns, as below, so that only whole words are replaced. (I assume you want AIRPLANE RD changed to AIRPLANE RD, not AIRPLN RD)

for(i in 1:nrow(usps)) {
  vendor$new_addr1 = str_replace_all(
    vendor$new_addr1, 
    pattern = paste0("\\b", usps$usps_name[i], "\\b"), 
    replacement = usps$usps_abbrv[i])
}



回答2:


This might be one of the most confusing r code that I have ever written but it kind of solves the problem

library(tidyverse)

df_phrases <- tribble(~phrases,
                      "testing this street for pests",
                      "this street better be lit")

df_lookup <- tribble(~word,~replacement,
                     "street","st",
                     "pests","rats",
                     "lit","well iluminated")

lookup_function <- function(phrase,df_lookup){
  wordss <- phrase %>% 
    str_split(" ")

  table_to_join <- tibble(word = wordss) %>% unnest()

  table_to_join %>%
    left_join(df_lookup) %>% 
    mutate(new_vector = if_else(replacement %>% is.na,
                                word,
                                replacement)) %>% 
    pull(new_vector) %>% 
    str_flatten(collapse = " ")

  # words_to_replace <- map(wordss,function(x) x %in% c(df_lookup$word))
  # tibble(wordss,words_to_replace) %>%
  #   unnest()
}

   df_phrases%>%
  mutate(test = phrases %>% map_chr(lookup_function,df_lookup))


来源:https://stackoverflow.com/questions/59397445/using-str-detect-or-some-other-function-and-some-way-to-loop-through-a-list-to

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!