Ruby: group_by operation on an array of hashes

匿名 (未验证) 提交于 2019-12-03 01:09:02

问题:

I have an array of hashes that represent compounds stored in boxes.

database = [{"Name"=>"Compound1", "Box"=>1},             {"Name"=>"Compound2", "Box"=>1},             {"Name"=>"Compound2", "Box"=>1},             {"Name"=>"Compound3", "Box"=>1},             {"Name"=>"Compound4", "Box"=>1},             {"Name"=>"Compound5", "Box"=>2},             {"Name"=>"Compound6", "Box"=>2},             {"Name"=>"Compound1", "Box"=>3},             {"Name"=>"Compound2", "Box"=>3},             {"Name"=>"Compound3", "Box"=>3},             {"Name"=>"Compound7", "Box"=>4}]

I would like to select a subset of the array, minimum by the number of boxes, that covers the full inventory of compounds (i.e., 1 to 7). Thus the result would be:

database = [{"Name"=>"Compound1", "Box"=>1},             {"Name"=>"Compound2", "Box"=>1},             {"Name"=>"Compound3", "Box"=>1},             {"Name"=>"Compound4", "Box"=>1},             {"Name"=>"Compound5", "Box"=>2},             {"Name"=>"Compound6", "Box"=>2},             {"Name"=>"Compound7", "Box"=>4}]

I can use the following to group compounds per box:

database.group_by{|x| x['Box']}

I have trouble reducing the result so that duplicate compound names are removed from the grouped operation.

回答1:

The essence of the problem is to find a minimal-size combination of boxes that includes ("covers") all of a set of specified "components". That combination of boxes is then used to compute objects of interest, as shown below.

Code

def min_box(database, coverage)   boxes_to_compounds = database.each_with_object(Hash.new {|h,k| h[k]=[]}) { |g,h|     h[g["Box"]] << g["Name"] }   boxes = boxes_to_compounds.keys   (1...boxes.size).each do |n|     boxes.combination(n).each do |combo| return combo if       (coverage - combo.flat_map { |box| boxes_to_compounds[box] }).empty?      end   end   nil end

coverage is a given array of required compounds (e.g., "Compound3").

Example

Suppose we are given database as given in the question and

coverage = ["Compound1", "Compound2", "Compound3", "Compound4",             "Compound5", "Compound6", "Compound7"] 

An optimal combination of boxes is then found to be

combo = min_box(database, coverage)   #=> [1, 2, 4]

We may now compute the desired array of elements of database:

database.select { |h| combo.include?(h["Box"]) }.uniq   #=> [{"Name"=>"Compound1", "Box"=>1}, {"Name"=>"Compound2", "Box"=>1},   #    {"Name"=>"Compound3", "Box"=>1}, {"Name"=>"Compound4", "Box"=>1},   #    {"Name"=>"Compound5", "Box"=>2}, {"Name"=>"Compound6", "Box"=>2},   #    {"Name"=>"Compound7", "Box"=>4}] 

min_box explanation

Finding an optimal combination of boxes is a hard (NP-complete) problem. Some form of enumeration of combinations of boxes is therefore required. I begin by determining if a single box provides the required coverage of components. If one of the boxes does, an optimal solution has been found and the method returns an array containing that box. If no single box covers all compounds, I look at all combinations of two boxes. If one of those combinations provides the required coverage it is an optimal solution and an array of those boxes is returned; else combinations of three boxes are considered. Eventually an optimal combination is found or it is concluded that all boxes together do not provide the required coverage, in which case the method returns nil.

For the example above, the calculations are as follows.

boxes_to_compounds = database.each_with_object(Hash.new {|h,k| h[k]=[]}) { |g,h|   h[g["Box"]] << g["Name"] }   #=> {1=>["Compound1", "Compound2", "Compound2", "Compound3", "Compound4"],   #    2=>["Compound5", "Compound6"],   #    3=>["Compound1", "Compound2", "Compound3"],   #    4=>["Compound7"]} boxes = boxes_to_compounds.keys   #=> [1, 2, 3, 4] boxes.size   #=> 4

Each of the elements 1...boxes.size is passed to the outer each block. Consider box 3.

n = 3 e = boxes.combination(n)   #=> #<Enumerator: [1, 2, 3, 4]:combination(3)> 

We may see the objects that will be generated by this enumerator and passed to the inner each block by converting it to an array.

e.to_a   #=> [[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]] 

The first element generated by e is passed to the block and the following is computed.

combo = e.next   #=> [1, 2, 3] a = combo.flat_map { |box| boxes_to_compounds[box] }   #=> ["Compound1", "Compound2", "Compound2", "Compound3", "Compound4",   #    "Compound5", "Compound6", "Compound1", "Compound2", "Compound3"]  b = coverage - a     #=> ["Compound7"]  b.empty?   #=> false 

As that combination of boxes does not include "Compound7" we press on and pass the next element generated by e to the block.

combo = e.next   #=> [1, 2, 4]  a = combo.flat_map { |box| boxes_to_compounds[box] }   #=> ["Compound1", "Compound2", "Compound2", "Compound3", "Compound4",   #    "Compound5", "Compound6", "Compound7"]  b = coverage - a     #=> []  b.empty?   #=> true 

We therefore have found an optimal combination of boxes, [1, 2, 4], which is returned by the method.



回答2:

With Ruby >= 2.4 we can use transform_values:

database.group_by { |hash| hash["Name"] }         .transform_values { |v| v.min_by { |h| h["Box"] } }         .values

Or if you have Ruby < 2.4 you can do:

database.group_by {|hash| hash["Name"] }.map { |_,v| v.min_by {|h| h["Box"]} }

Key methods: group_by, transform_values (Ruby > 2.4) and min_by. See Ruby Docs for more info.



回答3:

You could try with Array#uniq:

database = [{name: "Compound1", box: 1}, {name: "Compound2", box: 1}, {name: "Compound2", box: 1}, {name: "Compound3", box: 1}, {name: "Compound4", box: 1}, {name: "Compound5", box: 2}, {name: "Compound6", box: 2}, {name: "Compound1", box: 3}, {name: "Compound2", box: 3}, {name: "Compound3", box: 3}, {name: "Compound7", box: 4}]  p database.uniq{|k,_v| k[:name]} # =>  [ #   {:name=>"Compound1", :box=>1},  #   {:name=>"Compound2", :box=>1},  #   {:name=>"Compound3", :box=>1},  #   {:name=>"Compound4", :box=>1},  #   {:name=>"Compound5", :box=>2},  #   {:name=>"Compound6", :box=>2},  #   {:name=>"Compound7", :box=>4} # ]

Or:

p database.group_by{|k,_v| k[:box]}.each{|_k,v| v.uniq!{|k,_v| k[:name]}}  # => { #   1=>[ #     {:name=>"Compound1", :box=>1}, #     {:name=>"Compound2", :box=>1}, #     {:name=>"Compound3", :box=>1}, #     {:name=>"Compound4", :box=>1} #   ],  #   2=>[ #     {:name=>"Compound5", :box=>2},  #     {:name=>"Compound6", :box=>2} #   ], #   3=>[ #     {:name=>"Compound1", :box=>3}, #     {:name=>"Compound2", :box=>3}, #     {:name=>"Compound3", :box=>3} #   ], #   4=>[ #     {:name=>"Compound7", :box=>4} #   ] # }


回答4:

I don't like that original data structure. Why not just start with a hash of {CompoundX => BoxY} since "Name" and "Box" are not really useful. But if you're married to that structure, here's how I would do it:

database = [{"Name"=>"Compound1", "Box"=>1},             {"Name"=>"Compound2", "Box"=>1},             {"Name"=>"Compound2", "Box"=>1},             {"Name"=>"Compound3", "Box"=>1},             {"Name"=>"Compound4", "Box"=>1},             {"Name"=>"Compound5", "Box"=>2},             {"Name"=>"Compound6", "Box"=>2},             {"Name"=>"Compound1", "Box"=>3},             {"Name"=>"Compound2", "Box"=>3},             {"Name"=>"Compound3", "Box"=>3},             {"Name"=>"Compound7", "Box"=>4}]  new_db_arr = database.collect{|h| h.flatten}.flatten.collect{|i| i if i != "Name" && i != "Box"}.compact! new_db_hash = {} new_db_arr.each_slice(2) do |a,b|   if new_db_hash[a].nil?     new_db_hash[a] = []   end   new_db_hash[a] << b end  new_db_hash boxes = new_db_hash.values combos = boxes[0].product(*boxes[1..-1]) combos = combos.sort_by{|a| a.uniq.length } winning_combo = combos[0].uniq

The bulk of the work is just transforming the data structure into the hash of :Compound => boxNumber format. Then you generate every combination of boxes, sort by the combination's number of uniq items and take the one with the smallest number of uniq items as the answer. Not sure how great this would scale for very large datasets.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!