check if all characters of one string exist in another string in r

假装没事ソ 提交于 2019-11-28 10:06:46

问题


I am trying to compare strings like PRABHAKAR SHARMA and SHARMA KUMAR PRABHAKAR. the intention is to check if all the characters of the shorter string exist in the other string. If that is the case, I should get a 100% match otherwise a percentage representing the percentage of characters that matched.

I tried using levenshteinSim in RecordLinkage package but it gives a number corresponding to the number of changes required to change one string to another.

install.packages("RecordLinkage")
require(RecordLinkage)
levenshteinSim("PRABHAKAR SHARMA","SHARMA KUMAR PRABHAKAR")

#[1] 0.3636364

I want a 100% match in such a case. Also, this has to be replicated for over 1,000,000 records.


回答1:


Here is one approach

s1 <- "PRABHAKAR SHARMA"
s2 <- "SHARMA KUMAR PRABHAKAR"

compare <- function(s1, s2) {
    c1 <- unique(strsplit(s1, "")[[1]])
    c2 <- unique(strsplit(s2, "")[[1]])
    length(intersect(c1,c2))/length(c1)
}

compare(s1,s2)
#1

It may be a little slow, though. And it considers the space character as character, too. Use Vectorize to apply on a column:

dat <- data.frame(small=c("a", "b"), big=c("aa", "cc"), stringsAsFactors=FALSE)
vcomp <- Vectorize(compare)
dat <- transform(dat, comp=vcomp(small, big))



回答2:


If the characters to be considered are only letters you could use:

comp <- function(s1, s2){         
     in1 = letters %in% strsplit(tolower(s1), "")[[1]]
     in2 = letters %in% strsplit(tolower(s2), "")[[1]]
     sum(in1 & in2)/sum(in1)
}


来源:https://stackoverflow.com/questions/36085290/check-if-all-characters-of-one-string-exist-in-another-string-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!