I'm trying to use gsub to remove words / text in column y that are in column x.
x = c("a","b","c")
y = c("asometext", "some, a b text", "c a text")
df = cbind(x,y)
df = data.frame(df)
df$y = gsub(df$x, "", df$y)
If I run the code above, it removes only the text from column x row 1 and not all the rows:
> df
x y
1 a sometext
2 b some, b text
3 c c text
I want the end result to be:
> df
x y
1 a sometext
2 b some, text
3 c text
So all the words / letters from column x should be removed from the column y. Is this possible with gsub?
Normally gsub
takes three arguments 1) pattern, 2) replacement and 3) vector to replace values.
The pattern must be a single string. And the same for the replacement. The only part of the function that is open to multiple values is the vector. We call it vectorized because of this.
gsub(df$x, "", df$y) #doesn't work because 'df$x' isn't one string
The pattern argument is not vectorized, but we can use mapply
to complete the task.
mapply and gsub (bffs)
x = c("a","b","c")
y = c("asometext", "some, a b text", "c a text")
repl = ""
#We do
mapply(gsub, x, repl, y)
#On the inside
gsub(x[[1]], repl[[1]], y[[1]])
gsub(x[[2]], repl[[2]], y[[2]])
gsub(x[[3]], repl[[3]], y[[3]])
You may be asking, but I only have one repl
, how does repl[[2]]
and repl[[3]]
work? The function noticed that for us and repeated 'repl' until it equaled the length of the others.
来源:https://stackoverflow.com/questions/41049013/r-gsub-remove-words-in-column-y-from-words-in-column-x