How to flatten a hash, making each key a unique value?

 ̄綄美尐妖づ 提交于 2019-12-05 01:25:56

Interesting question!

Theory

Here's a recursive method to parse your data.

  • It keeps track of which keys and indices it has found.
  • It appends them in a tmp array.
  • Once a leaf object has been found, it gets written in a hash as value, with a joined tmp as key.
  • This small hash then gets recursively merged back to the main hash.

Code

def recursive_parsing(object, tmp = [])
  case object
  when Array
    object.each.with_index(1).with_object({}) do |(element, i), result|
      result.merge! recursive_parsing(element, tmp + [i])
    end
  when Hash
    object.each_with_object({}) do |(key, value), result|
      result.merge! recursive_parsing(value, tmp + [key])
    end
  else
    { tmp.join('_') => object }
  end
end

As an example:

require 'pp'
pp recursive_parsing(data)
# {"Name"=>"Kim Kones",
#  "License Number"=>"54321",
#  "Details_Name"=>"Kones, Kim",
#  "Details_Licenses_1_License Type"=>"PT",
#  "Details_Licenses_1_License Number"=>"54321",
#  "Details_Licenses_2_License Type"=>"Temp",
#  "Details_Licenses_2_License Number"=>"T123",
#  "Details_Licenses_3_License Type"=>"AP",
#  "Details_Licenses_3_License Number"=>"A666",
#  "Details_Licenses_3_Expiration Date"=>"12/31/2020"}

Debugging

Here's a modified version with old-school debugging. It might help you understand what's going on:

def recursive_parsing(object, tmp = [], indent="")
  puts "#{indent}Parsing #{object.inspect}, with tmp=#{tmp.inspect}"
  result = case object
  when Array
    puts "#{indent} It's an array! Let's parse every element:"
    object.each_with_object({}).with_index(1) do |(element, result), i|
      result.merge! recursive_parsing(element, tmp + [i], indent + "  ")
    end
  when Hash
    puts "#{indent} It's a hash! Let's parse every key,value pair:"
    object.each_with_object({}) do |(key, value), result|
      result.merge! recursive_parsing(value, tmp + [key], indent + "  ")
    end
  else
    puts "#{indent} It's a leaf! Let's return a hash"
    { tmp.join('_') => object }
  end
  puts "#{indent} Returning #{result.inspect}\n"
  result
end

When called with recursive_parsing([{a: 'foo', b: 'bar'}, {c: 'baz'}]), it displays:

Parsing [{:a=>"foo", :b=>"bar"}, {:c=>"baz"}], with tmp=[]
 It's an array! Let's parse every element:
  Parsing {:a=>"foo", :b=>"bar"}, with tmp=[1]
   It's a hash! Let's parse every key,value pair:
    Parsing "foo", with tmp=[1, :a]
     It's a leaf! Let's return a hash
     Returning {"1_a"=>"foo"}
    Parsing "bar", with tmp=[1, :b]
     It's a leaf! Let's return a hash
     Returning {"1_b"=>"bar"}
   Returning {"1_a"=>"foo", "1_b"=>"bar"}
  Parsing {:c=>"baz"}, with tmp=[2]
   It's a hash! Let's parse every key,value pair:
    Parsing "baz", with tmp=[2, :c]
     It's a leaf! Let's return a hash
     Returning {"2_c"=>"baz"}
   Returning {"2_c"=>"baz"}
 Returning {"1_a"=>"foo", "1_b"=>"bar", "2_c"=>"baz"}

Unlike the others, I have no love for each_with_object :-). But I do like passing a single result hash around so I don't have to merge and remerge hashes over and over again.

def flattify(value, result = {}, path = [])
  case value
  when Array
    value.each.with_index(1) do |v, i|
      flattify(v, result, path + [i])
    end
  when Hash
    value.each do |k, v|
      flattify(v, result, path + [k])
    end
  else
    result[path.join("_")] = value
  end
  result
end

(Some details adopted from Eric, see comments)

Non-recursive approach, using BFS with an array as a queue. I keep the key-value pairs where the value isn't an array/hash, and push array/hash contents to the queue (with combined keys). Turning arrays into hashes (["a", "b"] ↦ {1=>"a", 2=>"b"}) as that felt neat.

def flattify(hash)
  (q = hash.to_a).select { |key, value|
    value = (1..value.size).zip(value).to_h if value.is_a? Array
    !value.is_a?(Hash) || !value.each { |k, v| q << ["#{key}_#{k}", v] }
  }.to_h
end

One thing I like about it is the nice combination of keys as "#{key}_#{k}". In my other solution, I could've also used a string path = '' and extended that with path + "_" + k, but that would've caused a leading underscore that I'd have to avoid or trim with extra code.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!