Count common words in two strings

后端 未结 3 2055
春和景丽
春和景丽 2020-12-10 19:00

I have two strings:

a <- \"Roy lives in Japan and travels to Africa\"
b <- \"Roy travels Africa with this wife\"

I am looking to get

相关标签:
3条回答
  • 2020-12-10 19:03

    Perhaps, using intersect and str_extract For multiple strings, you can either put them as a list or as vector

     vec1 <- c(a,b)
     Reduce(`intersect`,str_extract_all(vec1, "\\w+"))
     #[1] "Roy"     "travels" "Africa" 
    

    For faster options, consider stringi

     library(stringi)
     Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+"))
     #[1] "Roy"     "travels" "Africa" 
    

    For counting:

     length(Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+")))
     #[1] 3
    

    Or using base R

      Reduce(`intersect`,regmatches(vec1,gregexpr("\\w+", vec1)))
      #[1] "Roy"     "travels" "Africa" 
    
    0 讨论(0)
  • 2020-12-10 19:05

    This approach is generalizable to n vectors:

    a <- "Roy lives in Japan and travels to Africa"
    b <- "Roy travels Africa with this wife"
    c <- "Bob also travels Africa for trips but lives in the US unlike Roy."
    
    library(stringi);library(qdapTools)
    X <- stri_extract_all_words(list(a, b, c))
    X <- mtabulate(X) > 0
    Y <- colSums(X) == nrow(X); names(Y)[Y]
    
    [1] "Africa"  "Roy"     "travels"
    
    0 讨论(0)
  • 2020-12-10 19:13

    You can use strsplit and intersect from the base library:

    > a <- "Roy lives in Japan and travels to Africa"
    > b <- "Roy travels Africa with this wife"
    > a_split <- unlist(strsplit(a, sep=" "))
    > b_split <- unlist(strsplit(b, sep=" "))
    > length(intersect(a_split, b_split))
    [1] 3
    
    0 讨论(0)
提交回复
热议问题