I have two strings:
a <- \"Roy lives in Japan and travels to Africa\"
b <- \"Roy travels Africa with this wife\"
I am looking to get
Perhaps, using intersect
and str_extract
For multiple strings
, you can either put them as a list
or as vector
vec1 <- c(a,b)
Reduce(`intersect`,str_extract_all(vec1, "\\w+"))
#[1] "Roy" "travels" "Africa"
For faster
options, consider stringi
library(stringi)
Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+"))
#[1] "Roy" "travels" "Africa"
For counting:
length(Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+")))
#[1] 3
Or using base R
Reduce(`intersect`,regmatches(vec1,gregexpr("\\w+", vec1)))
#[1] "Roy" "travels" "Africa"
This approach is generalizable to n vectors:
a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"
c <- "Bob also travels Africa for trips but lives in the US unlike Roy."
library(stringi);library(qdapTools)
X <- stri_extract_all_words(list(a, b, c))
X <- mtabulate(X) > 0
Y <- colSums(X) == nrow(X); names(Y)[Y]
[1] "Africa" "Roy" "travels"
You can use strsplit and intersect from the base
library:
> a <- "Roy lives in Japan and travels to Africa"
> b <- "Roy travels Africa with this wife"
> a_split <- unlist(strsplit(a, sep=" "))
> b_split <- unlist(strsplit(b, sep=" "))
> length(intersect(a_split, b_split))
[1] 3