Merging hash values in an array of hashes based on key

﹥>﹥吖頭↗ 提交于 2019-12-08 11:17:43

问题


I have an array of hashes similar to this:

[
  {"student": "a","scores": [{"subject": "math","quantity": 10},{"subject": "english", "quantity": 5}]},
  {"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]},
  {"student": "a", "scores": [ { "subject": "math", "quantity": 2},{"subject": "science", "quantity": 5 } ] }
]

Is there a simpler way of getting the output similar to this except looping through the array and finding a duplicate and then combining them?

[
  {"student": "a","scores": [{"subject": "math","quantity": 12},{"subject": "english", "quantity": 5},{"subject": "science", "quantity": 5 } ]},
  {"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]}
]

Rules for merging duplicate objects:

  • Students are merged on matching "value" (e.g. student "a", student "b")
  • Students scores on identical subjects are added (e.g. student a's math scores 2 and 10 become 12 when merged)

回答1:


Is there a simpler way of getting the output similar to this except looping through the array and finding a duplicate and then combining them?

Not that I know of. IF you explain where this data is comeing form the answer may be different but just based on the Array of Hash objects I think you will haev to iterate and combine.

While it is not elegant you could use a solution like this

arr = [
      {"student"=> "a","scores"=> [{"subject"=> "math","quantity"=> 10},{"subject"=> "english", "quantity"=> 5}]},
      {"student"=> "b", "scores"=> [{"subject"=> "math","quantity"=> 1 }, {"subject"=> "english","quantity"=> 2 } ]},
      {"student"=> "a", "scores"=> [ { "subject"=> "math", "quantity"=> 2},{"subject"=> "science", "quantity"=> 5 } ] }
    ]
#Group the array by student
arr.group_by{|student| student["student"]}.map do |student_name,student_values|
  {"student" => student_name,
  #combine all the scores and group by subject
  "scores" => student_values.map{|student| student["scores"]}.flatten.group_by{|score| score["subject"]}.map do |subject,subject_values|
    {"subject" => subject,
    #combine all the quantities into an array and reduce using `+`
    "quantity" => subject_values.map{|h| h["quantity"]}.reduce(:+)
    }
  end
  }
end
#=> [
    {"student"=>"a", "scores"=>[
                        {"subject"=>"math", "quantity"=>12},  
                        {"subject"=>"english", "quantity"=>5}, 
                        {"subject"=>"science", "quantity"=>5}]}, 
    {"student"=>"b", "scores"=>[
                        {"subject"=>"math", "quantity"=>1}, 
                        {"subject"=>"english", "quantity"=>2}]}
    ]

I know that you specified your expected result but I wanted to point out that making the output simpler makes the code simpler.

 arr.map(&:dup).group_by{|a| a.delete("student")}.each_with_object({}) do |(student, scores),record|
   record[student] = scores.map(&:values).flatten.map(&:values).each_with_object(Hash.new(0)) do |(subject,score),obj|
     obj[subject] += score
     obj
  end
  record
 end
 #=>{"a"=>{"math"=>12, "english"=>5, "science"=>5}, "b"=>{"math"=>1, "english"=>2}}

With this structure getting the students is as easy as calling .keys and the scores would be equally as simple. I am thinking something like

above_result.each do |student,scores|
    puts student
    scores.each do |subject,score|
      puts "  #{subject.capitalize}: #{score}"
    end
  end
end

The console out put would be

a
  Math: 12
  English: 5
  Science: 5
b
  Math: 1
  English: 2



回答2:


There are two common ways of aggregating values in such instances. The first is to employ the method Enumerable#group_by, as @engineersmnky has done in his answer. The second is to build a hash using the form of the method Hash#update (a.k.a. merge!) that uses a block to resolve the values of keys which are present in both of the hashes being merged. My solution uses the latter approach, not because I prefer it to the group_by, but just to show you a different way it can be done. (Had engineersmnky used update, I would have gone with group_by.)

Your problem is complicated somewhat by the particular data structure you are using. I found that the solution could be simplfied and made easier to follow by first converting the data to a different structure, update the scores, then convert the result back to your data structure. You may want to consider changing the data structure (if that's an option for you). I've addressed that issue in the "Discussion" section.

Code

def combine_scores(arr)
  reconstruct(update_scores(simplify(arr)))
end

def simplify(arr)
  arr.map do |h|
    hash = Hash[h[:scores].map { |g| g.values }]
    hash.default = 0
    { h[:student]=> hash }
  end
end

def update_scores(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g) do |_, h_scores, g_scores|
      g_scores.each { |subject,score| h_scores[subject] += score }
      h_scores
    end
  end
end

def reconstruct(h)
  h.map { |k,v| { student: k, scores: v.map { |subject, score|
    { subject: subject, score: score } } } }
end

Example

arr = [
  { student: "a", scores: [{ subject: "math",    quantity: 10 },
                           { subject: "english", quantity:  5 }] },
  { student: "b", scores: [{ subject: "math",    quantity:  1 },
                           { subject: "english", quantity:  2 } ] },
  { student: "a", scores: [{ subject: "math",    quantity:  2 },
                           { subject: "science", quantity:  5 } ] }]
combine_scores(arr)
  #=> [{ :student=>"a",
  #      :scores=>[{ :subject=>"math",    :score=>12 },
  #                { :subject=>"english", :score=> 5 },
  #                { :subject=>"science", :score=> 5 }] },
  #    { :student=>"b",
  #      :scores=>[{ :subject=>"math",    :score=> 1 },
  #                { :subject=>"english", :score=> 2 }] }] 

Explanation

First consider the two intermediate calculations:

a = simplify(arr)
  #=> [{ "a"=>{ "math"=>10, "english"=>5 } },
  #    { "b"=>{ "math"=> 1, "english"=>2 } },
  #    { "a"=>{ "math"=> 2, "science"=>5 } }]

h = update_scores(a)
  #=> {"a"=>{"math"=>12, "english"=>5, "science"=>5}
  #    "b"=>{"math"=> 1, "english"=>2}}

Then

reconstruct(h)

returns the result shown above.

+ simplify

arr.map do |h|
  hash = Hash[h[:scores].map { |g| g.values }]
  hash.default = 0
  { h[:student]=> hash }
end

This maps each hash into a simpler one. For example, the first element of arr:

h = { student: "a", scores: [{ subject: "math",    quantity: 10 },
                             { subject: "english", quantity:  5 }] }

is mapped to:

{ "a"=>Hash[[{ subject: "math",    quantity: 10 },
             { subject: "english", quantity:  5 }].map { |g| g.values }] }
#=> { "a"=>Hash[[["math", 10], ["english", 5]]] }
#=> { "a"=>{"math"=>10, "english"=>5}}

Setting the default value of each hash to zero simplifies the update step, which follows.

+ update_scores

For the array of hashes a that is returned by simplify, we compute:

a.each_with_object({}) do |g,h|
  h.update(g) do |_, h_scores, g_scores|
    g_scores.each { |subject,score| h_scores[subject] += score }
    h_scores
  end
end

Each element of a (a hash) is merged into an initially-empty hash, h. As update (same as merge!) is used for the merge, h is modified. If both hashes share the same key (e.g., "math"), the values are summed; else subject=>score is added to h.

Notice that if h_scores does not have the key subject, then:

h_scores[subject] += score
  #=> h_scores[subject] = h_scores[subject] + score
  #=> h_scores[subject] = 0 + score (because the default value is zero)
  #=> h_scores[subject] = score

That is, the key-value pair from g_scores is merely added to h_scores.

I've replaced the block variable representing the subject with a placeholder _, to reduce the chance of errors and to inform the reader that it is not used in the block.

+ reconstruct

The final step is to convert the hash returned by update_scores back to the original data structure, which is straightforward.

Discussion

If you change the data structure, and it meets your requirements, you may wish to consider changing it to that produced by combine_scores:

h = { "a"=>{ math: 10, english: 5 }, "b"=>{ math:  1, english: 2 } }

Then to update the scores with:

g = { "a"=>{ math: 2, science: 5 }, "b"=>{ english: 3 }, "c"=>{ science: 4 } }

you would merely to the following:

h.merge(g) { |_,oh,nh| oh.merge(nh) { |_,ohv,nhv| ohv+nhv } }
  #=> { "a"=>{ :math=>12, :english=>5, :science=>5 },
  #     "b"=>{ :math=> 1, :english=>5 },
  #     "c"=>{ :science=>4 } }


来源:https://stackoverflow.com/questions/27306879/merging-hash-values-in-an-array-of-hashes-based-on-key

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!