I am trying to compare strings like PRABHAKAR SHARMA and SHARMA KUMAR PRABHAKAR. the intention is to check if all the characters of the shorter str
If the characters to be considered are only letters you could use:
comp <- function(s1, s2){
in1 = letters %in% strsplit(tolower(s1), "")[[1]]
in2 = letters %in% strsplit(tolower(s2), "")[[1]]
sum(in1 & in2)/sum(in1)
}
Here is one approach
s1 <- "PRABHAKAR SHARMA"
s2 <- "SHARMA KUMAR PRABHAKAR"
compare <- function(s1, s2) {
c1 <- unique(strsplit(s1, "")[[1]])
c2 <- unique(strsplit(s2, "")[[1]])
length(intersect(c1,c2))/length(c1)
}
compare(s1,s2)
#1
It may be a little slow, though. And it considers the space character as character, too. Use Vectorize to apply on a column:
dat <- data.frame(small=c("a", "b"), big=c("aa", "cc"), stringsAsFactors=FALSE)
vcomp <- Vectorize(compare)
dat <- transform(dat, comp=vcomp(small, big))