There are some words which are used sometimes as a verb and sometimes as other part of speech.
Example
A sentence with the meaning of the w
You can install TreeTagger and then use the koRpus
package in R to use TreeTagger from R. Install it in a location like e.g. C:\Treetagger
.
I will first show how treetagger works so you understand what's going in the actual solution further down below in this answer:
library(koRpus)
your_sentences <- c("I blame myself for what happened",
"For what happened the blame is yours")
text.tagged <- treetag(file="I blame myself for what happened",
format="obj", treetagger="manual", lang="en",
TT.options = list(path="C:\\Treetagger", preset="en") )
text.tagged@TT.res[, 1:2]
# token tag
#1 I PP
#2 blame VVP
#3 myself PP
#4 for IN
#5 what WP
#6 happened VVD
The sentences have been analysed now and the "only thing left" is to remove those occurrences of "blame"
that are a verb.
I'll do this sentence for sentence by creating a function that first tags the sentence, then checks for "bad words" like "blame"
that are also a verb and finally removes them from the sentence:
remove_words <- function(sentence, badword="blame"){
tagged.text <- treetag(file=sentence, format="obj", treetagger="manual", lang="en",
TT.options=list(path=":C\\Treetagger", preset="en"))
# Check for bad words AND verb:
cond1 <- (tagged.text@TT.res$token == badword)
cond2 <- (substring(tagged.text@TT.res$tag, 0, 1) == "V")
redflag <- which(cond1 & cond2)
# If no such case, return sentence as is. If so, then remove that word:
if(length(redflag) == 0) return(sentence)
else{
splitsent <- strsplit(sentence, " ")[[1]]
splitsent <- splitsent[-redflag]
return(paste0(splitsent, collapse=" "))
}
}
lapply(your_sentences, remove_words)
# [[1]]
# [1] "I myself for what happened"
# [[2]]
# [1] "For what happened the blame is yours"