hiveql

Can I use 2 fields terminators(like ',' and '.') at a time in hive while creating table?

被刻印的时光 ゝ 提交于 2019-12-20 03:56:10
问题 I have a file with id and year . My fields are separated by , and . . Is there any chance I can in the place of fields terminated by can I use , and . ? 回答1: This is possible using RegexSerDe. hive> CREATE EXTERNAL TABLE citiesr1 (id int, city_org string, ppl float) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex'='^(\\d+)\\.(\\S+),(\\d++.\\d++)\\t.*') LOCATION '/user/it1/hive/serde/regex'; In the regex above three regex groups are defined. (\\d+

Can hiveconf variables be loaded from a file? (Separate from the HiveQL file)

喜欢而已 提交于 2019-12-20 03:25:18
问题 I often have a large block of HiveQL that I want to run multiple times with different settings for some variables. A simple example would be: set mindate='2015-01-01 00:00:00' set maxdate='2015-04-01 00:00:00' select * from my_table where the_date between ${hiveconf:mindate} and ${hiveconf:maxdate} Which is then run via hive -f myfile.sql > myout.log Later, I would like to change the variables and re-run. I also want a record of what values the variables had each time I ran. So I currently

How to pass multiple statements into Spark SQL HiveContext

蓝咒 提交于 2019-12-19 19:11:43
问题 For example I have few Hive HQL statements which I want to pass into Spark SQL: set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5; Following doesn't work: hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;") How to pass the statements into Spark SQL? 回答1: Thank you to @SamsonScharfrichter for the answer. This

How to pass multiple statements into Spark SQL HiveContext

蓝咒 提交于 2019-12-19 19:09:05
问题 For example I have few Hive HQL statements which I want to pass into Spark SQL: set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5; Following doesn't work: hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;") How to pass the statements into Spark SQL? 回答1: Thank you to @SamsonScharfrichter for the answer. This

Strange behaviour of Regexp_replace in a Hive SQL query

末鹿安然 提交于 2019-12-19 11:52:54
问题 I have some input information where I'm trying to remove the part .0 from my input where an ID string ends with .0 . select student_id, regexp_replace(student_id, '.0','') from school_result.credit_records where student_id like '%.0'; Input: 01-0230984.03 12345098.0 34567.0 Expected output: 01-0230984.03 12345098 34567 But the result I'm getting is as follows: It's removing any character having with a 0 next to it instead of removing only the occurrences that end with .0 0129843 123498 34567

How to access the HIVE ACID table in Spark sql?

别等时光非礼了梦想. 提交于 2019-12-19 11:03:44
问题 How could you access the HIVE ACID table, in Spark sql? 回答1: We have worked on and open sourced a datasource that will enable users to work on their Hive ACID Transactional tables using Spark. Github: https://github.com/qubole/spark-acid It is available as a Spark package and instructions to use it are on the Github page. Currently the datasource supports only reading from Hive ACID tables, and we are working on adding the ability to write into these tables via Spark as well. Feedback and

Hive sort array column with respect to other array column in same table

女生的网名这么多〃 提交于 2019-12-19 10:34:30
问题 I have a table in hive , with 2 columns as col1 array<int> and col2 array<double> . Output is as shown below col1 col2 [1,2,3,4,5] [0.43,0.01,0.45,0.22,0.001] I want to sort this col2 in ascending order and col1 should also change its index accordingly for e.g. col1 col2 [5,2,4,3,1] [0.001,0.01,0.22,0.43,0.45] 回答1: Explode both arrays, sort, then aggregate arrays again. Use sort in the subquery before collect_list to sort the array: with your_data as( select array(1,2,3,4,5) as col1,array(0

Adding a default value to a column while creating table in hive

心不动则不痛 提交于 2019-12-19 09:23:11
问题 I'm able to create a hive table from data in external file. Now I wish to create another table from data in previous table with additional columns with default value. I understand that CREATE TABLE AS SELECT can be used but how do I add additional columns with default value? 回答1: You could specify which columns to select from table on create/update. Simply provide default value as one of columns. Example with UPDATE is below: Creating simple table and populating it with value: hive> create

How to partition a Hive Table using range of values for a column

霸气de小男生 提交于 2019-12-19 09:08:47
问题 I have a Hive Table with 2 columns.Employee ID and Salary. Data is something like given below. Employee ID Salary 1 10000.08 2 20078.67 3 20056.45 4 30000.76 5 10045.14 6 43567.76 I want to create Partitions based on Salary Column.For Example Partition for salary range 10000 to 20000, 20001 to 30000. How do i achieve this. 回答1: Hive does not support range partitioning, but you can calculate ranges during data load. Create table partitioned by salary_range: create table your_table ( employee

Exploding Array of Struct using HiveQL

自作多情 提交于 2019-12-19 04:47:28
问题 CREATE TABLE IF NOT EXISTS Table2 ( USER_ID BIGINT, PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '-' collection items terminated by ',' map keys terminated by ':' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/rj/output2'; Below is the data in Table2 1345653-110909316904:1341894546,221065796761:1341887508 I can explode the above data by using this below query and it works fine for above data- SELECT * FROM (select