Replace multiple words in R easily; str_replace_all gives error that two objects are not equal lengths

瘦欲@ 提交于 2020-07-19 06:20:21

问题


I'm trying to use str_replace_all to replace many different values (i.e. "Mod", "M2", "M3", "Interviewer") with one the consistent string (i.e. "Moderator:"). I'm doing this with multiple different categories, and I want avoid having to write each unique value out as there are a lot.

So I made a tibble consisting of all the unique values that I want to make standardized and read it in and then pulled out each column (there are 5 but only 2 shown for simplicity) to make them into vectors:

speak_names <- read_csv("speak_names.csv")
speak_namesMisc <- dplyr::pull(speak_names, Misc)
speak_namesMod <- dplyr::pull(speak_names, Moderator)

For the replacement value, I made a character vector of equal length to those above vectors because I know that the replacement and pattern must be equal lengths:

Misc <- rep("Misc:", 2)
Mod <- rep("Moderator:", 28)

When I run Misc through with this code, it works just fine:

atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMisc, Misc)

But when I try the identical Moderator version (even if I attempt to run it before Misc), I get an error message:

atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMod, 
Mod)

Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
longer object length is not a multiple of shorter object length

I don't know why I'm getting this error because this identical function yields TRUE:

identical(length(speak_namesMod), length(Mod))

The dataframe that I'm working with is 16,244 lines long if that makes any difference to the pattern or replacement. I'm stuck and trying to find out why this isn't working and/or another solution that does not involve typing out each character element in the vectors.

Thank you!


回答1:


library('dplyr') # load the dplyr package
library('stringr') # load the stringr package

Here is a sample of my own dataset to answer your question

dput() of my data gives

abc<-as.data.frame(
structure(list(Name = c("ME-9_ 005", "ME-9_ 004", "ME-9_ 003", 
                        "ME-9_ 002", "ME-9_ 001", "ME-9_ 000", "ME-8_ 005", "ME-8_ 004", 
                        "ME-8_ 003", "ME-8_ 002", "ME-8_ 001", "ME-8_ 000", "ME-7_ 005", 
                        "ME-7_ 004", "ME-7_ 003", "ME-7_ 002", "ME-7_ 001", "ME-7_ 000"
), Mg = c(0.411058647473409, 0.361611969040526, 0.435757145931429, 
          0.36656632349025, 0.312782034685408, 0.357913661160629, 0.414639893651842, 
          0.460992875568015, 0.554803107534663, 0.418743792959099, 0.499114614445091, 
          0.475374442706501, 0.564660334010035, 0.502678818989733, 0.417617035801997, 
          0.488463005872639, 0.484776757286094, 0.424850010858818),
Al = c(0.575667101719941,  0.586351493923602, 0.574053324307634, 0.628497798862674, 0.552234153060378, 
       0.580547408629286, 1.05746950789483, 1.07094531357244, 1.11340157804305, 
       1.03043684466386, 1.02899468191215, 1.07222457991059, 1.5276908007952, 
       1.66549994904359, 1.43287302441973, 1.37434198093964, 1.55835986529032, 
       1.66902429579112), 
Si = c(0.495188340689301, 0.513374456164654, 
       0.51809643007659, 0.569128515813393, 0.542590350648068, 0.516673370168739, 
       1.72437228079744, 1.59076392020817, 1.77327433861292, 1.76671780355934, 
       1.60625706442694, 1.92449284567535, 3.27248599245035, 3.23739024834759, 
       2.84115179036218, 2.51112086010829, 2.98829002803169, 2.93347114563903
), 
P = c(0.222881184902066, 0.258237982165306, 0.230235867213535, 
      0.262379290809071, 0.230438623604524, 0.238615393939999, 0.260241811918024, 
      0.238785817517132, 0.248589968755681, 0.248270048794532, 0.272489046130942, 
      0.266707140244041, 0.25935282543278, 0.258801008935983, 0.250692297246152, 
      0.246890941447243, 0.277698144829677, 0.274197618349091)), 
row.names = c(NA, 
              -18L), class = c("tbl_df", "tbl", "data.frame")))

here is how my data looked before cleaning

head(abc,10)

But for your specific question, you should do

abc$Name <- str_replace_all(
  abc$Name, # column we want to search
  c("001" = "","002" = "","003" = "","004" = "","005" = "","000" = "",
    "-" = " ","_" = "") # each string schould be matched with a replacement
)

here is how my data looked after cleaning

head(abc,10)

I hope this helps



来源:https://stackoverflow.com/questions/50842140/replace-multiple-words-in-r-easily-str-replace-all-gives-error-that-two-objects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!