hiveql

How to go through all partitions in hive?

怎甘沉沦 提交于 2019-12-24 08:09:39
问题 I want to update column's value in all partitions. Before I found insert overwrite can be used to update data. My current statement is insert OVERWRITE table s_job PARTITION(pt = '20190101') select case job_name when 'Job' then 'system' end from s_job; However, it must specify certain partition. What I want is to update the value in all partitions, I don't know how to do. Is there a way using hive sql to go through all partitions in hive? Thank you so much. 回答1: Use dynamic partitioning: set

Extract data from a string sql

自闭症网瘾萝莉.ら 提交于 2019-12-24 08:09:34
问题 How would the regexp_extract change if this data was in a column named "pages" and I want for every row that has ':old:yes:' to return the string after 'yes:' and before the next string? PAGES (table name) hello:ok:old:yes:age:test:jack hello:no:old:yes:hour:black:nancy hi:fine:old:yes:minute:white:jason As you can see ':old:yes:' is my starting point and i want the regexp_extract to return the next text before the colon. In the above example I would want the following results: age hour

Insert data in many partitions using one insert statement

我是研究僧i 提交于 2019-12-24 07:35:22
问题 I have table A and table B, where B is the partitioned table of A using a field called X. When I want to insert data from A to B, I usually execute the following statement: INSERT INTO TABLE B PARTITION(X=x) SELECT <columnsFromA> FROM A WHERE X=x Now what I want to achieve is being able to insert a range of X, let's say x1, x2, x3... How can I achieve this in one single statement? 回答1: Use dynamic partition load: set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode

How to identify repeated occurrences of a string column in Hive?

谁说我不能喝 提交于 2019-12-24 07:24:06
问题 I have a view like this in Hive: id sequencenumber appname 242539622 1 A 242539622 2 A 242539622 3 A 242539622 4 B 242539622 5 B 242539622 6 C 242539622 7 D 242539622 8 D 242539622 9 D 242539622 10 B 242539622 11 B 242539622 12 D 242539622 13 D 242539622 14 F I'd like to have, per each id, the following view: id sequencenumber appname appname_c 242539622 1 A A 242539622 2 A A 242539622 3 A A 242539622 4 B B_1 242539622 5 B B_1 242539622 6 C C 242539622 7 D D_1 242539622 8 D D_1 242539622 9 D

removing join operations from a grouping query

送分小仙女□ 提交于 2019-12-24 04:18:28
问题 I have a table that looks like: usr_id query_ts 12345 2019/05/13 02:06 123444 2019/05/15 04:06 123444 2019/05/16 05:06 12345 2019/05/16 02:06 12345 2019/05/15 02:06 it contains a user ID with when they ran a query. Each entry in the table represents that ID running 1 query at the given timestamp. I am trying to produce this: usr_id day_1 day_2 … day_30 12345 31 13 15 123444 23 41 14 I would like to show the number of queries ran each day for the last 30 days for each ID, and if no query was

How to add an integer unique id to query results - __efficiently__?

做~自己de王妃 提交于 2019-12-24 03:32:13
问题 Given a query, select * from ... (that might be part of CTAS statement) The goal is to add an additional column, ID , where ID is a unique integer. select ... as ID,* from ... P.s. ID does not have to be sequential (there could be gaps) The ID could be arbitrary (doesn't have to represent a specific order within the result set) row_number logically solves the problem - select row_number() over () as ID,* from ... The problem is, that at least for now, global row_number (no partition by ) is

Remove leading zeros using HiveQL

淺唱寂寞╮ 提交于 2019-12-24 02:45:25
问题 I have a string value in which i might have leading zero's, so i want to remove all leading zeros. For example: accNumber = "000340" ---> "340" Any UDF is available in Hive? can we use regexp_extract for this? 回答1: Yes, just use REGEXP_REPLACE(). SELECT some_string, REGEXP_REPLACE(some_string, "^0+", '') stripped_string FROM db.tbl (fixed simple typo with comma) 来源: https://stackoverflow.com/questions/38146780/remove-leading-zeros-using-hiveql

How to Updata an ORC Hive table form Spark using Scala

感情迁移 提交于 2019-12-23 12:38:34
问题 I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell) objHiveContext.sql("select * from table_name ") able to see data but when I run objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e

how to get database username and password in hive

时间秒杀一切 提交于 2019-12-23 09:58:02
问题 Am writing jdbc program to connect hive database. I want the username and password to give it in the connection url. I don't know how to get the username and password using hive QL. Can anybody help me out?? Exception in thread "main" java.sql.SQLNonTransientConnectionException: [DataDirect][Hive JDBC Driver]A value was not specified for a required property: PASSWORD at com.ddtek.jdbc.hivebase.ddcp.b(Unknown Source) at com.ddtek.jdbc.hivebase.ddcp.a(Unknown Source) at com.ddtek.jdbc.hivebase

How to make a table that is automatically updated Hive

孤者浪人 提交于 2019-12-23 05:32:15
问题 I have created an external table that in Hive that uses data from a Parquet store in HDFS. When the data in HDFS is deleted, there is no data in the table. When the data is inserted again in the same spot in HDFS, the table does not get updated to contain the new data. If I insert new records into the existing table that contains data, no new data is shown when I run my Hive queries. How I create the table in Hive: CREATE EXTERNAL TABLE nodes (id string) STORED AS PARQUET LOCATION "/hdfs