hiveql | 易学教程

Can I use 2 fields terminators(like ',' and '.') at a time in hive while creating table?

阅读更多关于 Can I use 2 fields terminators(like ',' and '.') at a time in hive while creating table?

问题 I have a file with id and year . My fields are separated by , and . . Is there any chance I can in the place of fields terminated by can I use , and . ? 回答1: This is possible using RegexSerDe. hive> CREATE EXTERNAL TABLE citiesr1 (id int, city_org string, ppl float) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex'='^(\\d+)\\.(\\S+),(\\d++.\\d++)\\t.*') LOCATION '/user/it1/hive/serde/regex'; In the regex above three regex groups are defined. (\\d+

Can hiveconf variables be loaded from a file? (Separate from the HiveQL file)

阅读更多关于 Can hiveconf variables be loaded from a file? (Separate from the HiveQL file)

问题 I often have a large block of HiveQL that I want to run multiple times with different settings for some variables. A simple example would be: set mindate='2015-01-01 00:00:00' set maxdate='2015-04-01 00:00:00' select * from my_table where the_date between ${hiveconf:mindate} and ${hiveconf:maxdate} Which is then run via hive -f myfile.sql > myout.log Later, I would like to change the variables and re-run. I also want a record of what values the variables had each time I ran. So I currently

How to pass multiple statements into Spark SQL HiveContext

阅读更多关于 How to pass multiple statements into Spark SQL HiveContext

问题 For example I have few Hive HQL statements which I want to pass into Spark SQL: set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5; Following doesn't work: hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;") How to pass the statements into Spark SQL? 回答1: Thank you to @SamsonScharfrichter for the answer. This

How to pass multiple statements into Spark SQL HiveContext

阅读更多关于 How to pass multiple statements into Spark SQL HiveContext

Strange behaviour of Regexp_replace in a Hive SQL query

阅读更多关于 Strange behaviour of Regexp_replace in a Hive SQL query

问题 I have some input information where I'm trying to remove the part .0 from my input where an ID string ends with .0 . select student_id, regexp_replace(student_id, '.0','') from school_result.credit_records where student_id like '%.0'; Input: 01-0230984.03 12345098.0 34567.0 Expected output: 01-0230984.03 12345098 34567 But the result I'm getting is as follows: It's removing any character having with a 0 next to it instead of removing only the occurrences that end with .0 0129843 123498 34567

How to access the HIVE ACID table in Spark sql?

阅读更多关于 How to access the HIVE ACID table in Spark sql?

问题 How could you access the HIVE ACID table, in Spark sql? 回答1: We have worked on and open sourced a datasource that will enable users to work on their Hive ACID Transactional tables using Spark. Github: https://github.com/qubole/spark-acid It is available as a Spark package and instructions to use it are on the Github page. Currently the datasource supports only reading from Hive ACID tables, and we are working on adding the ability to write into these tables via Spark as well. Feedback and

Hive sort array column with respect to other array column in same table

阅读更多关于 Hive sort array column with respect to other array column in same table

问题 I have a table in hive , with 2 columns as col1 array<int> and col2 array<double> . Output is as shown below col1 col2 [1,2,3,4,5] [0.43,0.01,0.45,0.22,0.001] I want to sort this col2 in ascending order and col1 should also change its index accordingly for e.g. col1 col2 [5,2,4,3,1] [0.001,0.01,0.22,0.43,0.45] 回答1: Explode both arrays, sort, then aggregate arrays again. Use sort in the subquery before collect_list to sort the array: with your_data as( select array(1,2,3,4,5) as col1,array(0

Adding a default value to a column while creating table in hive

阅读更多关于 Adding a default value to a column while creating table in hive

问题 I'm able to create a hive table from data in external file. Now I wish to create another table from data in previous table with additional columns with default value. I understand that CREATE TABLE AS SELECT can be used but how do I add additional columns with default value? 回答1: You could specify which columns to select from table on create/update. Simply provide default value as one of columns. Example with UPDATE is below: Creating simple table and populating it with value: hive> create

How to partition a Hive Table using range of values for a column

阅读更多关于 How to partition a Hive Table using range of values for a column

问题 I have a Hive Table with 2 columns.Employee ID and Salary. Data is something like given below. Employee ID Salary 1 10000.08 2 20078.67 3 20056.45 4 30000.76 5 10045.14 6 43567.76 I want to create Partitions based on Salary Column.For Example Partition for salary range 10000 to 20000, 20001 to 30000. How do i achieve this. 回答1: Hive does not support range partitioning, but you can calculate ranges during data load. Create table partitioned by salary_range: create table your_table ( employee

Exploding Array of Struct using HiveQL

阅读更多关于 Exploding Array of Struct using HiveQL

问题 CREATE TABLE IF NOT EXISTS Table2 ( USER_ID BIGINT, PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '-' collection items terminated by ',' map keys terminated by ':' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/rj/output2'; Below is the data in Table2 1345653-110909316904:1341894546,221065796761:1341887508 I can explode the above data by using this below query and it works fine for above data- SELECT * FROM (select