Say I have a line in a file:
string <- \"thanks so much for your help all along. i\'ll let you know when....\"
I want to return a value
This is essentially a very crude implementation of Crayon's answer as a basic function:
withinRange <- function(string, term1, term2, threshold = 6) {
x <- strsplit(string, " ")[[1]]
abs(grep(term1, x) - grep(term2, x)) <= threshold
}
withinRange(string, "help", "know")
# [1] TRUE
withinRange(string, "thanks", "know")
# [1] FALSE
I would suggest getting a basic idea of the text tools available to you, and using them to write such a function. Note Tyler's comment: As implemented, this can match multiple terms ("you" would match "you" and "your") leading to funny results. You'll need to determine how you want to deal with these cases to have a more useful function.
Split your string:
> words <- strsplit(string, '\\s')[[1]]
Build a indices vector:
> indices <- 1:length(words)
Name indices:
> names(indices) <- words
Compute distance between words:
> abs(indices["help"] - indices["know"]) < 6
FALSE
EDIT In a function
distance <- function(string, term1, term2) {
words <- strsplit(string, "\\s")[[1]]
indices <- 1:length(words)
names(indices) <- words
abs(indices[term1] - indices[term2])
}
distance(string, "help", "know") < 6
EDIT Plus
There is a great advantage in indexing words, once its done you can work on a lot of statistics on a text.
you won't be able to get this from regex alone. I suggest splitting using space as delimiter, then loop or use a built-in function to do array search of your two terms and subtract the difference of the indexes (array positions).
edit: Okay I thought about it a second and perhaps this will work for you as a regex pattern:
\bhelp(\s+[^\s]+){1,5}+\s+know\b
This takes the same "space is the delimiter" concept. First matches for help then greedily up to 5 " word" then looks for " know" (since "know" would be the 6th).