hiveql | 易学教程

Explode the Array of Struct in Hive

阅读更多关于 Explode the Array of Struct in Hive

问题 This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) And this is the data in the above table- 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"},{"product_id":300003861266,"timestamps":"1340271857000"}] Is there any way I can get the below output from the HiveQL after exploding the array? **USER_ID** | **PRODUCT_ID** | **TIMESTAMPS** ------------+------------------+------

Explode (transpose?) multiple columns in Spark SQL table

阅读更多关于 Explode (transpose?) multiple columns in Spark SQL table

问题 I am using Spark SQL (I mention that it is in Spark in case that affects the SQL syntax - I'm not familiar enough to be sure yet) and I have a table that I am trying to re-structure, but I'm getting stuck trying to transpose multiple columns at the same time. Basically I have data that looks like: userId someString varA varB 1 "example1" [0,2,5] [1,2,9] 2 "example2" [1,20,5] [9,null,6] and I'd like to explode both varA and varB simultaneously (the length will always be consistent) - so that

Explode (transpose?) multiple columns in Spark SQL table

阅读更多关于 Explode (transpose?) multiple columns in Spark SQL table

HiveQL: Using query results as variables

阅读更多关于 HiveQL: Using query results as variables

问题 in Hive I'd like to dynamically extract information from a table, save it in a variable and further use it. Consider the following example, where I retrieve the maximum of column var and want to use it as a condition in the subsequent query. set maximo=select max(var) from table; select * from table where var=${hiveconf:maximo} It does not work, although set maximo=select max(var) from table; ${hiveconf:maximo} shows me the intended result. Doing: select '${hiveconf:maximo}' gives "select max

Hive hash function resulting in 0,null and 1, why?

阅读更多关于 Hive hash function resulting in 0,null and 1, why?

问题 I am using hive 0.13.1 and hashing combination of keys using default hive hash function. Something like select hash (date,token1,token2, parameters["a"],parameters["b"], parameters["c"]) from table1; I ran it on 150M rows. For 60% of the rows, it hashed it correctly. For the remaining rows, it gave 0. null or 1 as hash. I looked at the rows which resulted in bad hashes, I don't see anything wrong with the rows. What could be causing it? 回答1: The hash function returns 0 only when all supplied

Hive hash function resulting in 0,null and 1, why?

阅读更多关于 Hive hash function resulting in 0,null and 1, why?

Nested select in hiveQL

阅读更多关于 Nested select in hiveQL

问题 In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was

Nested select in hiveQL

阅读更多关于 Nested select in hiveQL

Hive query with certain specific exclude conditions

阅读更多关于 Hive query with certain specific exclude conditions

问题 I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include name = "summary" name = "details" name1 = "vehicle stats" name1 = "accelerometer" I have to count the number of customers who strictly follow the above conditions. For example, in the below table, customer "Joy" should not be counted because he has additionally done "expenses" in name even though he has both "summary" and "details" in name and "vehicle

Hive error: java.lang.Throwable: Child Error

阅读更多关于 Hive error: java.lang.Throwable: Child Error

问题 I am using CDH 5.9, while executing following hive query it is throwing error. Any idea about the issue? For normal select query its working but for complex query it results failure. hive> select * from table where dt='22-01-2017' and field like '%xyz%' limit 10; Query ID = hdfs_20170123200303_44a9c423-4bb3-4f80-ade4-b1312971eb63 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201701131637_0067, Tracking URL = http