I am trying to compare strings like PRABHAKAR SHARMA
and SHARMA KUMAR PRABHAKAR
. the intention is to check if all the characters of the shorter str
If the characters to be considered are only letters you could use:
comp <- function(s1, s2){
in1 = letters %in% strsplit(tolower(s1), "")[[1]]
in2 = letters %in% strsplit(tolower(s2), "")[[1]]
sum(in1 & in2)/sum(in1)
}
Here is one approach
s1 <- "PRABHAKAR SHARMA"
s2 <- "SHARMA KUMAR PRABHAKAR"
compare <- function(s1, s2) {
c1 <- unique(strsplit(s1, "")[[1]])
c2 <- unique(strsplit(s2, "")[[1]])
length(intersect(c1,c2))/length(c1)
}
compare(s1,s2)
#1
It may be a little slow, though. And it considers the space character as character, too. Use Vectorize
to apply on a column:
dat <- data.frame(small=c("a", "b"), big=c("aa", "cc"), stringsAsFactors=FALSE)
vcomp <- Vectorize(compare)
dat <- transform(dat, comp=vcomp(small, big))