How to extract outstanding values from an object returned by waldo::compare()?

不打扰是莪最后的温柔 提交于 2021-02-08 14:59:30

问题


I'm trying to use a new R package called waldo (see at the tidyverse blog too) that is designed to compare data objects to find differences. The waldo::compare() function returns an object that is, according to the documentation:

a character vector with class "waldo_compare"

The main purpose of this function is to be used within the console, leveraging coloring features to highlight outstanding values that are not equal between data objects. However, while just examining in console is useful, I do want to take those values and act on them (filter them out from the data, etc.). Therefore, I want to programmatically extract the outstanding values. I don't know how.

Example

  1. Generate a vector of length 10:
set.seed(2020)

vec_a <- sample(0:20, size = 10)

## [1]  3 15 13  0 16 11 10 12  6 18
  1. Create a duplicate vector, and add additional element (4) into an 11th argument
vec_b <- vec_a
vec_b[11] <- 4
vec_b <- as.integer(vec_b) 

## [1]  3 15 13  0 16 11 10 12  6 18  4
  1. Use waldo::compare() to test the differences between the two vectors
waldo::compare(vec_a, vec_b)

## `old[8:10]`: 12 6 18  
## `new[8:11]`: 12 6 18 4

The beauty is that it's highlighted in the console:


But now, how do I extract the different value?

I can try to assign waldo::compare() to an object:

waldo_diff <- waldo::compare(vec_a, vec_b)

and then what? when I try to do waldo_diff[[1]] I get:

[1] "`old[8:10]`: \033[90m12\033[39m \033[90m6\033[39m \033[90m18\033[39m  \n`new[8:11]`: \033[90m12\033[39m \033[90m6\033[39m \033[90m18\033[39m \033[34m4\033[39m"

and for waldo_diff[[2]] it's even worse:

Error in waldo_diff[3] : subscript out of bounds

Any idea how I could programmatically extract the outstanding values that appear in the "new" vector but not in the "old"?


回答1:


At least for the simple case of comparing two vectors, you’ll be better off using diffobj::ses_dat() (which is from the package that waldo uses under the hood) directly:

waldo::compare(1:3, 2:4)
#> `old`: 1 2 3  
#> `new`:   2 3 4

diffobj::ses_dat(1:3, 2:4)
#>       op val id.a id.b
#> 1 Delete   1    1   NA
#> 2  Match   2    2   NA
#> 3  Match   3    3   NA
#> 4 Insert   4   NA    3

For completeness, to extract additions you could do e.g.:

extract_additions <- function(x, y) {
  ses <- diffobj::ses_dat(x, y)
  y[ses$id.b[ses$op == "Insert"]]
}

old <- 1:3
new <- 2:4

extract_additions(old, new)
#> [1] 4



回答2:


As a disclaimer, I didn't know anything about this package until you posted so this is far from an authoritative answer, but you can't easily extract the different values using the compare() function as it returns an ANSI formatted string ready for pretty printing. Instead the workhorses for vectors seem to be the internal functions ses() and ses_context() which return the indices of the differences between the two objects. The difference seems to be that ses_context() splits the result into a list of non-contiguous differences.

waldo:::ses(vec_a, vec_b)

# A tibble: 1 x 5
     x1    x2 t        y1    y2
  <int> <int> <chr> <int> <int>
1    10    10 a        11    11

The results show that there is an addition in the new vector beginning and ending at position 11.

The following simple function is very limited in scope and assumes that only additions in the new vector are of interest:

new_diff_additions <- function(x, y) {
  res <- waldo:::ses(x, y)
  res <- res[res$t == "a",]  # keep only additions
  if (nrow(res) == 0) {
    return(NULL)
  }  else {
    Map(function(start, end) {
      d <- y[start:end]
      `attributes<-`(d, list(start = start, end = end))
    },
    res[["y1"]], res[["y2"]])
  }
}
    
new_diff_additions(vec_a, vec_b)

[[1]]
[1] 4
attr(,"start")
[1] 11
attr(,"end")
[1] 11


来源:https://stackoverflow.com/questions/64806192/how-to-extract-outstanding-values-from-an-object-returned-by-waldocompare

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!