Count common words in two strings

后端未结

关注

 3  2055

I have two strings:

a <- \"Roy lives in Japan and travels to Africa\"
b <- \"Roy travels Africa with this wife\"

I am looking to get

相关标签:

3条回答

南旧

2020-12-10 19:03

Perhaps, using intersect and str_extract For multiple strings, you can either put them as a list or as vector

 vec1 <- c(a,b)
 Reduce(`intersect`,str_extract_all(vec1, "\\w+"))
 #[1] "Roy"     "travels" "Africa"

For faster options, consider stringi

 library(stringi)
 Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+"))
 #[1] "Roy"     "travels" "Africa"

For counting:

 length(Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+")))
 #[1] 3

Or using base R

  Reduce(`intersect`,regmatches(vec1,gregexpr("\\w+", vec1)))
  #[1] "Roy"     "travels" "Africa"

0 讨论(0)

终归单人心

2020-12-10 19:05

This approach is generalizable to n vectors:

a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"
c <- "Bob also travels Africa for trips but lives in the US unlike Roy."

library(stringi);library(qdapTools)
X <- stri_extract_all_words(list(a, b, c))
X <- mtabulate(X) > 0
Y <- colSums(X) == nrow(X); names(Y)[Y]

[1] "Africa"  "Roy"     "travels"

0 讨论(0)

春和景丽

2020-12-10 19:13

You can use strsplit and intersect from the base library:

> a <- "Roy lives in Japan and travels to Africa"
> b <- "Roy travels Africa with this wife"
> a_split <- unlist(strsplit(a, sep=" "))
> b_split <- unlist(strsplit(b, sep=" "))
> length(intersect(a_split, b_split))
[1] 3

0 讨论(0)