Extract characters that differ between two strings

后端 未结 6 1950
小蘑菇
小蘑菇 2020-12-03 08:09

I have used adist to calculate the number of characters that differ between two strings:

a <- \"Happy day\"
b <- \"Tappy Pay\"
adist(a,b)          


        
相关标签:
6条回答
  • 2020-12-03 08:41

    You can use the following sequence of operations:

    • split the string using strsplit().
    • Use setdiff() to compare the elements
    • Wrap in a reducing function

    Try this:

    Reduce(setdiff, strsplit(c(a, b), split = ""))
    [1] "H" "d"
    
    0 讨论(0)
  • 2020-12-03 08:48

    As long as a and b have the same length we can do this:

    s.a <- strsplit(a, "")[[1]]
    s.b <- strsplit(b, "")[[1]]
    paste(s.a[s.a != s.b], collapse = "")
    

    giving:

    [1] "Hd"
    

    This seems straightforward in terms of clarity of the code and seems tied for the fastest of the solutions provided here although I think I prefer f3:

    f1 <- function(a, b)
      paste(setdiff(strsplit(a,"")[[1]],strsplit(b,"")[[1]]), collapse = "")
    
    f2 <- function(a, b)
      paste(sapply(setdiff(utf8ToInt(a), utf8ToInt(b)), intToUtf8), collapse = "")
    
    f3 <- function(a, b) 
      paste(Reduce(setdiff, strsplit(c(a, b), split = "")), collapse = "")
    
    f4 <- function(a, b) {
      s.a <- strsplit(a, "")[[1]]
      s.b <- strsplit(b, "")[[1]]
      paste(s.a[s.a != s.b], collapse = "")
    }
    
    a <- "Happy day"
    b <- "Tappy Pay"
    
    library(rbenchmark)
    benchmark(f1, f2, f3, f4, replications = 10000, order = "relative")[1:4]
    

    giving the following on a fresh session on my laptop:

      test replications elapsed relative
    3   f3        10000    0.07    1.000
    4   f4        10000    0.07    1.000
    1   f1        10000    0.09    1.286
    2   f2        10000    0.10    1.429
    

    I have assumed that the differences must be in the corresponding character positions. You might want to clarify if that is the intention or not.

    0 讨论(0)
  • 2020-12-03 08:52

    You can use one of the variables as a regex character class and gsub out from the other one:

    gsub(paste0("[",a,"]"),"",b)
    [1] "TP"
    gsub(paste0("[",b,"]"),"",a)
    [1] "Hd"
    
    0 讨论(0)
  • 2020-12-03 08:52

    The following function could be a better option to solve problem like this.

    list.string.diff <- function(a, b, exclude = c("-", "?"), ignore.case = TRUE, show.excluded = FALSE)
    {
    if(nchar(a)!=nchar(b)) stop("Lengths of input strings differ. Please check your input.")
    if(ignore.case)
    {
    a <- toupper(a)
    b <- toupper(b)
    }
    split_seqs <- strsplit(c(a, b), split = "")
    only.diff <- (split_seqs[[1]] != split_seqs[[2]])
    only.diff[
    (split_seqs[[1]] %in% exclude) |
    (split_seqs[[2]] %in% exclude)
    ] <- NA
    diff.info<-data.frame(which(is.na(only.diff)|only.diff),
    split_seqs[[1]][only.diff],split_seqs[[2]][only.diff])
    names(diff.info)<-c("position","poly.seq.a","poly.seq.b")
    if(!show.excluded) diff.info<-na.omit(diff.info)
    diff.info
    

    from https://www.r-bloggers.com/extract-different-characters-between-two-strings-of-equal-length/

    Then you can run

    list.string.diff(a, b)
    

    to get the difference.

    0 讨论(0)
  • 2020-12-03 08:53

    Not really proud of this, but it seems to do the job:

    sapply(setdiff(utf8ToInt(a), utf8ToInt(b)), intToUtf8)
    

    Results:

    [1] "H" "d"
    
    0 讨论(0)
  • 2020-12-03 09:02

    Split into letters and take the difference as sets:

    > setdiff(strsplit(a,"")[[1]],strsplit(b,"")[[1]])
    [1] "H" "d"
    
    0 讨论(0)
提交回复
热议问题