impala

impala transpose column to row

流过昼夜 提交于 2019-12-12 17:40:11
问题 How to transpose column data to row data in impala I have tried some solution that not work in impala but working in hive. Table name : test Data: day name jobdone 2017-03-25 x_user 5 2017-03-25 y_user 10 2017-03-31 x_user 20 2017-03-31 y_user 1 I want the data should be like that in impala no in hive Required Output Data Day x_user y_user 2017-03-05 5 10 2001-03-31 20 1 I am able to do in Hive using the Map and collect_list. How can i do in Impala. 回答1: Using case + min() or max()

How to create External Table on Hive from data on local disk instead of HDFS?

倾然丶 夕夏残阳落幕 提交于 2019-12-12 17:25:40
问题 For data on HDFS, we can do CREATE EXTERNAL TABLE <table> { id INT, name STRING, age INT } LOCATION 'hdfs_path'; But how to specify a local path for the LOCATION above? Thanks. 回答1: You can upload the file to HDFS first using "hdfs dfs -put " and then create Hive external table on top of that. The reason that Hive cannot create external table on local file is because when Hive processes data, the actual processing happens on the Hadoop cluster where your local file may not be accessible at

Impala>Impala数据导入方式

假装没事ソ 提交于 2019-12-12 10:17:41
1.load data 首先创建一个表: create table user ( id int , name string , age int ) row format delimited fields terminated by "\t" ; 准备数据user.txt并上传到hdfs的 /user/impala路径下去 加载数据 load data inpath '/user/impala/' into table user ; 查询加载的数据 select * from user ; 如果查询不不到数据,那么需要刷新一遍数据表。 refresh user ; 2.insert into values 这种方式非常类似于RDBMS的数据插入方式。 create table t_test2 ( id int , name string ) ; insert into table t_test2 values ( 1 , ”zhangsan” ) ; 3.insert into select 插入一张表的数据来自于后面的select查询语句返回的结果。 4.create as select 建表的字段个数、类型、数据来自于后续的select查询语句。 来源: CSDN 作者: 千千匿迹 链接: https://blog.csdn.net/qq_44509920/article

hive get list of non existing and existing data

痴心易碎 提交于 2019-12-12 05:57:22
问题 Two tables : Reg Global ID | uom ID | uom ------------------ ---------------- 1 | kg 1 | kg 1 | gm 1 | gm 1 | ml 3 | pl 3 | pl Desired output: ID | reg | glob ------------------ 1 | kg | kg 1 | gm | gm 1 | ml | null 3 | pl | pl Query tried: SELECT reg.id, reg.UOM ,glob.uom FROM reg LEFT JOIN global glob ON reg.id=reg.id and reg.uom = glob.uom WHERE glob.uom is null and reg.id =1 Output: reg.id | reg.uom | glob.uom 1 | ml | null Thanks in advance. 回答1: Remove the where clause.Just the left

Hive JDBC error: java.lang.NoSuchFieldError: HIVE_CLI_SERVICE_PROTOCOL_V7

梦想与她 提交于 2019-12-12 04:34:25
问题 I'm trying to create a connection via JDBC to Impala using the Hive2 connector. But I'm getting this error: Exception in thread "main" java.lang.NoSuchFieldError: HIVE_CLI_SERVICE_PROTOCOL_V7 at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:175) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at dsnoc.dsnoc_api.dolar

Impala GROUP BY partitioned column

拈花ヽ惹草 提交于 2019-12-12 04:14:16
问题 Theoretical question, Lets say I have table with four columns : A,B,C,D. Values of A and D are equal, table is partitioned by column A. Performance wise, would it make any difference if I issue this query SELECT SUM(B) GROUP BY A ; or this one : SELECT SUM(B) GROUP BY D ; In different words I'm asking, is there any performance gain by using the GROUP BY on partitioned column ? Thanks 回答1: Usually there are performance gains if you use the partitioned columns on a filter (WHERE clause in your

sqoop create impala parquet table

情到浓时终转凉″ 提交于 2019-12-12 03:56:36
问题 I'm relatively new the process of sqooping so pardon any ignorance. I have been trying to sqoop a table from a data source as a parquet file and create an impala table (also as parquet) into which I will insert the sqooped data. The code runs without an issue, but when I try to select a couple rows for testing I get the error: .../EWT_CALL_PROF_DIM_SQOOP/ec2fe2b0-c9fa-4ef9-91f8-46cf0e12e272.parquet' has an incompatible Parquet schema for column 'dru_id.test_ewt_call_prof_dim_parquet.call_prof

Native Impala UDF (Cpp) randomly gives result as NULL for same inputs in the same table for multiple invocations in same query

烈酒焚心 提交于 2019-12-11 19:23:21
问题 I have a Native Impala UDF (Cpp) with two functions Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput When I run a impala query on a parquet table like this, select column1,myUDF(column1),length(myUDF(column1)),myUDFreverso(myUDF(column1)) from my_parquet_table order by column1 LIMIT 10; The output is NULL at random. The output is say at 1st run

query to divide data

走远了吗. 提交于 2019-12-11 15:46:11
问题 we have two columns id and monthid. The output what I'm looking for is to divide year from month Id based on quarter . The output column should be from quarter. If id is active output should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the output is still 1 . Like this: id month ----------------------------------- 100 2012-03-01 00:00:00.0 100 2015-09-01 00:00:00.0 100 2016-10-01 00:00:00.0 100 2015-11-01 00:00:00.0 100 2014-01-01 00:00:00.0 100 2013-04-01 00:00:00.0 100

NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive

孤街浪徒 提交于 2019-12-11 15:19:27
问题 I am trying to create a Kudu table using Impala-shell . Query : CREATE TABLE lol ( uname STRING, age INTEGER, PRIMARY KEY(uname) ) STORED AS KUDU TBLPROPERTIES ( 'kudu.master_addresses' = '127.0.0.1' ); CREATE TABLE t (k INT PRIMARY KEY) STORED AS KUDU TBLPROPERTIES ( 'kudu.master_addresses' = '127.0.0.1' ); But I am getting error: ERROR: ImpalaRuntimeException: Error creating Kudu table 'impala::default.t' CAUSED BY: NonRecoverableException: Not enough live tablet servers to create a table