Ruby - Show Deltas Between 2 array of hashes based on subset of hash keys

夙愿已清 提交于 2019-12-06 02:42:29

This isn't very pretty, but it works. It creates a third array containing all unique values in both array1 and array2 and iterates through that.

Then, since include? doesn't allow a custom matcher, we can fake it by using detect and looking for an item in the array which has the custom sub-hash matching. We'll wrap that in a custom method so we can just call it passing in array1 or array2 instead of writing it twice.

Finally, we loop through our array3 and determine whether the item came from array1, array2, or both of them and add to the corresponding output array.

array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
          {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }

# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
  object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }

  array.detect do |item|
    { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
  end
end

# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []

# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
  in_array1 = is_included_in(array1, item)
  in_array2 = is_included_in(array2, item)

  if in_array1 && in_array2
    array1_and_array2.push item
  elsif in_array1
    array1_only.push item
  else
    array2_only.push item
  end
end


puts array1_only.inspect        # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect        # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect  # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]

For this type of problem it's generally easiest to work with indices.

Code

def keepers(array1, array2, keys)
  a1 = make_hash(array1, keys)
  a2 = make_hash(array2, keys)
  common_keys_of_a1_and_a2 = a1.keys & a2.keys
  [keeper_idx(array1, a1, common_keys_of_a1_and_a2),
   keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end

def make_hash(arr, keys)
  arr.each_with_index.with_object({}) do |(g,i),h|
    (h[g.values_at(*keys)] ||= []) << i
  end
end

def keeper_idx(arr, a, common_keys_of_a1_and_a2)
  arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end

Example

array1 =
  [{'id' =>  '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
   {'id' =>  '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
   {'id' =>  '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 =
  [{'id' =>  '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
   {'id' =>  '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
   {'id' =>  '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
   {'id' =>  '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
   {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

Notice that the two arrays are slightly different than those given in the question. The question did not specify whether each array could contain two hashes the have the same values for the specified keys. I therefore added a hash to each array to show has that case is dealt with.

keys = ['id', 'ref', 'name']

idx1, idx2 = keepers(array1, array2, keys)
  #=> [[1, 4], [2, 3, 4, 5]]

idx1 (idx2) are the indices of the elements of array1 (array2) that remain after matches are removed. (array1 and array2 are not modified, however.)

It follows that the two arrays map to

array1.values_at(*idx1)
  #=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
  #    {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]

and

array2.values_at(*idx2)
  #=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
  #    {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
  #    {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
  #    {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]

The indices of the hashes that are removed are given as follows.

array1.size.times.to_a - idx1
  #=> [0, 2, 3]
array2.size.times.to_a - idx2
  #[0, 1]

Explanation

The steps are as follows.

a1 = make_hash(array1, keys)
  #=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
  #    ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}    
a2 = make_hash(array2, keys)
  #=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
  #    ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
  #    ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
  #=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
  #=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
  #=> [2, 3, 4, 5]· (for array2)

See Array#- and Array#&

array1 - array2   #data in array1 but not in array2
array2 - array1   #data in array2 but not in array1
array1 & array2   #data in both array1 and array2

Since you've tagged this question you can use sets similarly:

require 'set'

set1 = array1.to_set
set2 = array2.to_set

set1 - set2
set2 - set1
set1 & set2
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!