Search and replace only specific lines in R

≡放荡痞女 提交于 2019-12-24 01:36:16

问题


I would like to search and replace in my database some characters but not in all the lines.

Here's my data base :

 1. 41 R JEAN JAURES 93170
 2. 42 AV DE STALINGRAD 93170
 3. 51 57 R JULES FERRY 93170
 4. 1 R DU HAVRE 93170

I would like to replace to have :

 5. 41 RUE JEAN JAURES 93170
 6. 42 AVENUE DE STALINGRAD 93170
 7. 51 57 RUE JULES FERRY 93170
 8. 1 RUE DU HAVRE 93170

So, I try the sub() function, but in 2. it will replace the first R so it will be STALINGRUEAD instead of STALINGRAD.

I also try the substr() but like in 3. there might be some long number of character before the letter to replace. As I have ~600k addresses there will be lot of exceptions like this.

Is there a way to add some restrictions in those functions to fulfill my goal?


回答1:


You can use \\s+ to match 1 or more spaces and \\s* to match 0 or more spaces.

 vec <- c("41 R JEAN JAURES 93170",
 "42 AV DE STALINGRAD 93170",
 "51 57 R JULES FERRY 93170",
 "1 R DU HAVRE 93170")


 library(magrittr)
 vec %>% 
   gsub("\\s*R\\s+", " RUE ", .) %>%
   gsub("\\s*AV\\s+", " AVENUE ", .)

[1] "41 RUE JEAN JAURES 93170"      "42 AVENUE DE STALINGRAD 93170"
[3] "51 57 RUE JULES FERRY 93170"   "1 RUE DU HAVRE 93170" 

Furthermore you might consider \\b for word boundaries (which includes space):

 vec %>% 
   gsub("\\bR\\s+", "RUE ", .) %>%
   gsub("\\bAV\\s+", "AVENUE ", .)



回答2:


You can try some regular expressions with stringr. If 'R' for 'RUE' will consistently be the first 'R' character in each line, you could use stringr::str_replace, which replaces only the first match in each string:

library(tidyverse)
#> Warning: package 'dplyr' was built under R version 3.5.1
data <- c(
  "1. 41 R JEAN JAURES 93170",
  "2. 42 AV DE STALINGRAD 93170",
  "3. 51 57 R JULES FERRY 93170",
  "4. 1 R DU HAVRE 93170")
data %>% 
  str_replace("(?<!\\w)R(?!\\w)", "RUE")
#> [1] "1. 41 RUE JEAN JAURES 93170"    "2. 42 AV DE STALINGRAD 93170"
#> [3] "3. 51 57 RUE JULES FERRY 93170" "4. 1 RUE DU HAVRE 93170"

Edit: added a second reprex after the "R" per the comments



来源:https://stackoverflow.com/questions/51540961/search-and-replace-only-specific-lines-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!