Why does 'get_json_object' return different results when run in spark and sql tool

夙愿已清 提交于 2019-12-13 04:04:03

问题


I have developed a hive query that uses lateral views and get_json_object to unpack some json. The query works well enough using a jdbc client (dbvisualizer) against a hive database but when run as spark sql from a java application, on the same data, it returns nothing. I have tracked down the problem to differences in what the function 'get_json_object' returns.

The issue can be illustrated by this type of query

select concat_ws( "|", get_json_object('{"product_offer":[
{"productName":"Plan A"},
{"productName":"Plan B"}]}', 
'$.product_offer.productName') )

When run in dbvisualizer against a Hive database I get an array of the 2 product names in the json array: ["Plan A","Plan B"]. When the same query is run as spark sql from a java application, null is returned.

I have noticed another difference: the path '$.product_offer[0].productName' returns 'Plan A' in db visualizer and nothing in spark.


回答1:


The path to extract the array of product names is

select concat_ws( "|", get_json_object('{"product_offer":[{"productName":"Plan A"},{"productName":"Plan B"}]}', '$.product_offer[*].productName'

which works both in spark dbvisualizer.



来源:https://stackoverflow.com/questions/57763111/why-does-get-json-object-return-different-results-when-run-in-spark-and-sql-to

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!