hiveql

Explode the Array of Struct in Hive

白昼怎懂夜的黑 提交于 2019-12-28 03:26:05
问题 This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) And this is the data in the above table- 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"},{"product_id":300003861266,"timestamps":"1340271857000"}] Is there any way I can get the below output from the HiveQL after exploding the array? **USER_ID** | **PRODUCT_ID** | **TIMESTAMPS** ------------+------------------+------

Explode (transpose?) multiple columns in Spark SQL table

旧街凉风 提交于 2019-12-27 22:51:18
问题 I am using Spark SQL (I mention that it is in Spark in case that affects the SQL syntax - I'm not familiar enough to be sure yet) and I have a table that I am trying to re-structure, but I'm getting stuck trying to transpose multiple columns at the same time. Basically I have data that looks like: userId someString varA varB 1 "example1" [0,2,5] [1,2,9] 2 "example2" [1,20,5] [9,null,6] and I'd like to explode both varA and varB simultaneously (the length will always be consistent) - so that

Explode (transpose?) multiple columns in Spark SQL table

怎甘沉沦 提交于 2019-12-27 22:51:11
问题 I am using Spark SQL (I mention that it is in Spark in case that affects the SQL syntax - I'm not familiar enough to be sure yet) and I have a table that I am trying to re-structure, but I'm getting stuck trying to transpose multiple columns at the same time. Basically I have data that looks like: userId someString varA varB 1 "example1" [0,2,5] [1,2,9] 2 "example2" [1,20,5] [9,null,6] and I'd like to explode both varA and varB simultaneously (the length will always be consistent) - so that

HiveQL: Using query results as variables

我的未来我决定 提交于 2019-12-27 12:03:27
问题 in Hive I'd like to dynamically extract information from a table, save it in a variable and further use it. Consider the following example, where I retrieve the maximum of column var and want to use it as a condition in the subsequent query. set maximo=select max(var) from table; select * from table where var=${hiveconf:maximo} It does not work, although set maximo=select max(var) from table; ${hiveconf:maximo} shows me the intended result. Doing: select '${hiveconf:maximo}' gives "select max

Hive hash function resulting in 0,null and 1, why?

社会主义新天地 提交于 2019-12-25 15:44:29
问题 I am using hive 0.13.1 and hashing combination of keys using default hive hash function. Something like select hash (date,token1,token2, parameters["a"],parameters["b"], parameters["c"]) from table1; I ran it on 150M rows. For 60% of the rows, it hashed it correctly. For the remaining rows, it gave 0. null or 1 as hash. I looked at the rows which resulted in bad hashes, I don't see anything wrong with the rows. What could be causing it? 回答1: The hash function returns 0 only when all supplied

Hive hash function resulting in 0,null and 1, why?

耗尽温柔 提交于 2019-12-25 15:43:51
问题 I am using hive 0.13.1 and hashing combination of keys using default hive hash function. Something like select hash (date,token1,token2, parameters["a"],parameters["b"], parameters["c"]) from table1; I ran it on 150M rows. For 60% of the rows, it hashed it correctly. For the remaining rows, it gave 0. null or 1 as hash. I looked at the rows which resulted in bad hashes, I don't see anything wrong with the rows. What could be causing it? 回答1: The hash function returns 0 only when all supplied

Nested select in hiveQL

别等时光非礼了梦想. 提交于 2019-12-25 10:54:19
问题 In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was

Nested select in hiveQL

ぐ巨炮叔叔 提交于 2019-12-25 10:54:03
问题 In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was

Hive query with certain specific exclude conditions

邮差的信 提交于 2019-12-25 08:36:42
问题 I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include name = "summary" name = "details" name1 = "vehicle stats" name1 = "accelerometer" I have to count the number of customers who strictly follow the above conditions. For example, in the below table, customer "Joy" should not be counted because he has additionally done "expenses" in name even though he has both "summary" and "details" in name and "vehicle

Hive error: java.lang.Throwable: Child Error

坚强是说给别人听的谎言 提交于 2019-12-25 08:03:23
问题 I am using CDH 5.9, while executing following hive query it is throwing error. Any idea about the issue? For normal select query its working but for complex query it results failure. hive> select * from table where dt='22-01-2017' and field like '%xyz%' limit 10; Query ID = hdfs_20170123200303_44a9c423-4bb3-4f80-ade4-b1312971eb63 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201701131637_0067, Tracking URL = http