Getting distance between two words in R

后端未结

关注

 3  1632

Say I have a line in a file:

string <- \"thanks so much for your help all along. i\'ll let you know when....\"

I want to return a value

相关标签:

3条回答

死守一世寂寞

2020-12-22 01:18
This is essentially a very crude implementation of Crayon's answer as a basic function:
```
withinRange <- function(string, term1, term2, threshold = 6) {
  x <- strsplit(string, " ")[[1]]
  abs(grep(term1, x) - grep(term2, x)) <= threshold
}

withinRange(string, "help", "know")
# [1] TRUE

withinRange(string, "thanks", "know")
# [1] FALSE
```
I would suggest getting a basic idea of the text tools available to you, and using them to write such a function. Note Tyler's comment: As implemented, this can match multiple terms ("you" would match "you" and "your") leading to funny results. You'll need to determine how you want to deal with these cases to have a more useful function.
0 讨论(0)
发布评论:

提交评论
- 加载中...

感情败类

2020-12-22 01:38

Split your string:

> words <- strsplit(string, '\\s')[[1]]

Build a indices vector:

> indices <- 1:length(words)

Name indices:

> names(indices) <- words

Compute distance between words:

> abs(indices["help"] - indices["know"]) < 6
FALSE

EDIT In a function

 distance <- function(string, term1, term2) {
    words <- strsplit(string, "\\s")[[1]]
    indices <- 1:length(words)
    names(indices) <- words
    abs(indices[term1] - indices[term2])
 }

 distance(string, "help", "know") < 6

EDIT Plus

There is a great advantage in indexing words, once its done you can work on a lot of statistics on a text.

0 讨论(0)

栀梦

2020-12-22 01:40

you won't be able to get this from regex alone. I suggest splitting using space as delimiter, then loop or use a built-in function to do array search of your two terms and subtract the difference of the indexes (array positions).

edit: Okay I thought about it a second and perhaps this will work for you as a regex pattern:

\bhelp(\s+[^\s]+){1,5}+\s+know\b

This takes the same "space is the delimiter" concept. First matches for help then greedily up to 5 " word" then looks for " know" (since "know" would be the 6th).

0 讨论(0)
发布评论:

提交评论
- 加载中...