Extract characters that differ between two strings

大城市里の小女人 提交于 2020-01-09 05:09:38

问题


I have used adist to calculate the number of characters that differ between two strings:

a <- "Happy day"
b <- "Tappy Pay"
adist(a,b) # result 2

Now I would like to extract those character that differ. In my example, I would like to get the string "Hd" (or "TP", it doesn't matter).

I tried to look in adist, agrep and stringi but found nothing.


回答1:


You can use the following sequence of operations:

  • split the string using strsplit().
  • Use setdiff() to compare the elements
  • Wrap in a reducing function

Try this:

Reduce(setdiff, strsplit(c(a, b), split = ""))
[1] "H" "d"



回答2:


Split into letters and take the difference as sets:

> setdiff(strsplit(a,"")[[1]],strsplit(b,"")[[1]])
[1] "H" "d"



回答3:


Not really proud of this, but it seems to do the job:

sapply(setdiff(utf8ToInt(a), utf8ToInt(b)), intToUtf8)

Results:

[1] "H" "d"



回答4:


You can use one of the variables as a regex character class and gsub out from the other one:

gsub(paste0("[",a,"]"),"",b)
[1] "TP"
gsub(paste0("[",b,"]"),"",a)
[1] "Hd"



回答5:


As long as a and b have the same length we can do this:

s.a <- strsplit(a, "")[[1]]
s.b <- strsplit(b, "")[[1]]
paste(s.a[s.a != s.b], collapse = "")

giving:

[1] "Hd"

This seems straightforward in terms of clarity of the code and seems tied for the fastest of the solutions provided here although I think I prefer f3:

f1 <- function(a, b)
  paste(setdiff(strsplit(a,"")[[1]],strsplit(b,"")[[1]]), collapse = "")

f2 <- function(a, b)
  paste(sapply(setdiff(utf8ToInt(a), utf8ToInt(b)), intToUtf8), collapse = "")

f3 <- function(a, b) 
  paste(Reduce(setdiff, strsplit(c(a, b), split = "")), collapse = "")

f4 <- function(a, b) {
  s.a <- strsplit(a, "")[[1]]
  s.b <- strsplit(b, "")[[1]]
  paste(s.a[s.a != s.b], collapse = "")
}

a <- "Happy day"
b <- "Tappy Pay"

library(rbenchmark)
benchmark(f1, f2, f3, f4, replications = 10000, order = "relative")[1:4]

giving the following on a fresh session on my laptop:

  test replications elapsed relative
3   f3        10000    0.07    1.000
4   f4        10000    0.07    1.000
1   f1        10000    0.09    1.286
2   f2        10000    0.10    1.429

I have assumed that the differences must be in the corresponding character positions. You might want to clarify if that is the intention or not.




回答6:


The following function could be a better option to solve problem like this.

list.string.diff <- function(a, b, exclude = c("-", "?"), ignore.case = TRUE, show.excluded = FALSE)
{
if(nchar(a)!=nchar(b)) stop("Lengths of input strings differ. Please check your input.")
if(ignore.case)
{
a <- toupper(a)
b <- toupper(b)
}
split_seqs <- strsplit(c(a, b), split = "")
only.diff <- (split_seqs[[1]] != split_seqs[[2]])
only.diff[
(split_seqs[[1]] %in% exclude) |
(split_seqs[[2]] %in% exclude)
] <- NA
diff.info<-data.frame(which(is.na(only.diff)|only.diff),
split_seqs[[1]][only.diff],split_seqs[[2]][only.diff])
names(diff.info)<-c("position","poly.seq.a","poly.seq.b")
if(!show.excluded) diff.info<-na.omit(diff.info)
diff.info

from https://www.r-bloggers.com/extract-different-characters-between-two-strings-of-equal-length/

Then you can run

list.string.diff(a, b)

to get the difference.



来源:https://stackoverflow.com/questions/28834459/extract-characters-that-differ-between-two-strings

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!