How to remove '\' from a string in sparklyr

前提是你 提交于 2019-12-11 03:47:46

问题


I am using sparklyr and have a spark dataframe with a column wordthat contains words, some of which contain special characters which I want to remove. I was succesful in using regepx_replace and \\\\ before special characters, just like this:

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\(', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\)', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\+', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\?', '')) %>%
  mutate(word = regexp_replace(word, '\\\\:', '')) %>%
  mutate(word = regexp_replace(word, '\\\\;', '')) %>%
  mutate(word = regexp_replace(word, '\\\\!', ''))

Now I want to remove \. I have tried both :

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\\', ''))

and :

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\', ''))

But neither will work...


回答1:


You have to correct your code for both R-side and Java side escaping so what you need is actually "\\\\\\\\":

df <- copy_to(sc, tibble(word = "(abc\\zyx: 1)"))

df %>% mutate(regexp_replace(word, "\\\\\\\\", ""))
# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word           `regexp_replace(word, "\\\\\\\\\\\\\\\\", "")`
  <chr>          <chr>                                         
1 "(abc\\zyx:1)" (abczyx: 1)  

Depending on your exact requirement it might be easier to match all characters at once. You could for example preserve only word characters (\w) and whitespaces (\s):

df %>% mutate(regexp_replace(word, "[^\\\\w+\\\\s+]", ""))
# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+\\\\\\\\s+]", "")`
  <chr>           <chr>                                                
1 "(abc\\zyx: 1)" abczyx 1     

or word characters only

df %>% mutate(regexp_replace(word, "[^\\\\w+]", ""))
# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+]", "")`
  <chr>           <chr>                                      
1 "(abc\\zyx: 1)" abczyx1  


来源:https://stackoverflow.com/questions/52149872/how-to-remove-from-a-string-in-sparklyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!