How to convert json string datatype column to map datatype column in hive?

时光怂恿深爱的人放手 提交于 2019-12-11 04:12:31

问题


I need to get all the unique key values from all the rows. Each row has different keys and values Please find the above image of the column.

eg: one row looks like

{"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585}

回答1:


Tested this with two rows of data, all key_value pairs are identical except in second JSON there is one additional NEW_KEY and PARSING.NGRAM_MATCHES_CACHED values are different.

with data as
(
select stack(2, --data example
'{"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":1549002807585}',
'{"NEW_KEY":12345,"START_TIME":1549002807568,"PARSING.QUERY_FORMED":1549002807586,"CUBES_WITH_PERMISSIONS":1549002807568,"PARSING.CUBE_MATCH_SELECTED":1549002807586,"POTENTIAL_COMPLETIONS_ADDED":1549002807587,"QUERY_PARSED":1549002807586,"SUGGESTIONS_FORMED":1549002807606,"PARSING.SEQUENCES_GENERATED":1549002807568,"PARSING.NGRAM_MATCHES_CACHED":154900280758}'
) as str
)

select str_to_map(concat_ws(',',collect_set(key_value)),',',':') --collect set, concatenate and convert to map
from
(
select explode(split(regexp_replace (str,'[{}"]',''),',')) key_value from data --remove JSON delimiters, split and explode pairs
)s;

Result:

OK
{"START_TIME":"1549002807568","PARSING.QUERY_FORMED":"1549002807586","CUBES_WITH_PERMISSIONS":"1549002807568","PARSING.CUBE_MATCH_SELECTED":"1549002807586","POTENTIAL_COMPLETIONS_ADDED":"1549002807587","QUERY_PARSED":"1549002807586","SUGGESTIONS_FORMED":"1549002807606","PARSING.SEQUENCES_GENERATED":"1549002807568","PARSING.NGRAM_MATCHES_CACHED":"154900280758","NEW_KEY":"12345"}
Time taken: 158.414 seconds, Fetched: 1 row(s)

Of course, "PARSING.NGRAM_MATCHES_CACHED" exists only one time in the result, because map does not allow the same key twice. All key_values are unique. Read comments in the code please.



来源:https://stackoverflow.com/questions/54764580/how-to-convert-json-string-datatype-column-to-map-datatype-column-in-hive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!