Getting distance between two words in R

后端 未结 3 1623
我寻月下人不归
我寻月下人不归 2020-12-22 00:59

Say I have a line in a file:

string <- \"thanks so much for your help all along. i\'ll let you know when....\"

I want to return a value

相关标签:
3条回答
  • 2020-12-22 01:18

    This is essentially a very crude implementation of Crayon's answer as a basic function:

    withinRange <- function(string, term1, term2, threshold = 6) {
      x <- strsplit(string, " ")[[1]]
      abs(grep(term1, x) - grep(term2, x)) <= threshold
    }
    
    withinRange(string, "help", "know")
    # [1] TRUE
    
    withinRange(string, "thanks", "know")
    # [1] FALSE
    

    I would suggest getting a basic idea of the text tools available to you, and using them to write such a function. Note Tyler's comment: As implemented, this can match multiple terms ("you" would match "you" and "your") leading to funny results. You'll need to determine how you want to deal with these cases to have a more useful function.

    0 讨论(0)
  • 2020-12-22 01:38

    Split your string:

    > words <- strsplit(string, '\\s')[[1]]
    

    Build a indices vector:

    > indices <- 1:length(words)
    

    Name indices:

    > names(indices) <- words
    

    Compute distance between words:

    > abs(indices["help"] - indices["know"]) < 6
    FALSE
    

    EDIT In a function

     distance <- function(string, term1, term2) {
        words <- strsplit(string, "\\s")[[1]]
        indices <- 1:length(words)
        names(indices) <- words
        abs(indices[term1] - indices[term2])
     }
    
     distance(string, "help", "know") < 6
    

    EDIT Plus

    There is a great advantage in indexing words, once its done you can work on a lot of statistics on a text.

    0 讨论(0)
  • 2020-12-22 01:40

    you won't be able to get this from regex alone. I suggest splitting using space as delimiter, then loop or use a built-in function to do array search of your two terms and subtract the difference of the indexes (array positions).

    edit: Okay I thought about it a second and perhaps this will work for you as a regex pattern:

    \bhelp(\s+[^\s]+){1,5}+\s+know\b

    This takes the same "space is the delimiter" concept. First matches for help then greedily up to 5 " word" then looks for " know" (since "know" would be the 6th).

    0 讨论(0)
提交回复
热议问题