hiveql | 易学教程

How to select current date in Hive SQL

阅读更多关于 How to select current date in Hive SQL

问题 How do we get the current system date in Hive? In MySQL we have select now(), can any one please help me to get the query results. I am very new to Hive, is there a proper documentation for Hive that gives the details information about the pseudo columns, and built-in functions. 回答1: According to the LanguageManual, you can use unix_timestamp() to get the "current time stamp using the default time zone." If you need to convert that to something more human-readable, you can use from_unixtime

I have created a table in hive, I would like to know which directory my table is created in?

阅读更多关于 I have created a table in hive, I would like to know which directory my table is created in?

I have created a table in hive, I would like to know which directory my table is created in? I would like to know the path... DESCRIBE FORMATTED my_table; or DESCRIBE FORMATTED my_table PARTITION (my_column='my_value'); There are three ways to describe a table in Hive. 1) To see table primary info of Hive table, use describe table_name; command 2) To see more detailed information about the table, use describe extended table_name; command 3) To see code in a clean manner use describe formatted table_name; command to see all information. also describe all details in a clean manner. Resource:

What is the difference between -hivevar and -hiveconf?

阅读更多关于 What is the difference between -hivevar and -hiveconf?

From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B I didn't quite feel like the examples from the documentation were adequate, so here's my attempt at an answer. In the beginning there was only --hiveconf and variable substitution didn't exist. The --hiveconf option allowed users to set Hive configuration values from the command line and that was it. All Hive configuration values are stored under the hiveconf namespace, i.e. hiveconf:mapred.reduce.tasks . These values allowed you to

How to export data from Spark SQL to CSV

阅读更多关于 How to export data from Spark SQL to CSV

This command works with HiveQL: insert overwrite directory '/data/home.csv' select * from testtable; But with Spark SQL I'm getting an error with an org.apache.spark.sql.hive.HiveQl stack trace: java.lang.RuntimeException: Unsupported language features in query: insert overwrite directory '/data/home.csv' select * from testtable Please guide me to write export to CSV feature in Spark SQL. You can use below statement to write the contents of dataframe in CSV format df.write.csv("/data/home/csv") If you need to write the whole dataframe into a single CSV file, then use df.coalesce(1).write.csv("

Hive insert query like SQL

阅读更多关于 Hive insert query like SQL

I am new to hive, and want to know if there is anyway to insert data into hive table like we do in SQL. I want to insert my data into hive like INSERT INTO tablename VALUES (value1,value2..) I have read that you can load the data from a file to hive table or you can import data from one table to hive table but is there any way to append the data as in SQL? Some of the answers here are out of date as of Hive 0.14 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL It is now possible to insert using syntax such as: CREATE TABLE

Why partitions elimination does not happen for this query?

阅读更多关于 Why partitions elimination does not happen for this query?

I have a hive table which is partitioned by year, month, day and hour. I need to run a query against it to fetch the last 7 days data. This is in Hive 0.14.0.2.2.4.2-2 . My query currently looks like this : SELECT COUNT(column_name) from table_name where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7)); This takes a very long time. When I substitute the actual numbers for the above say something like : SELECT COUNT(column_name) from table_name where year >=

How to Update/Drop a Hive Partition?

阅读更多关于 How to Update/Drop a Hive Partition?

问题 After adding a partition to an external table in Hive , how can I update/drop it? 回答1: You can update a Hive partition by, for example: ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18) SET LOCATION 'hdfs://user/darcy/logs/2012/12/18'; This command does not move the old data, nor does it delete the old data. It simply sets the partition to the new location. To drop a partition, you can do ALTER TABLE logs DROP IF EXISTS PARTITION(year = 2012, month = 12, day = 18); Hope it helps!

“reduce” a set of rows in Hive to another set of rows

阅读更多关于 “reduce” a set of rows in Hive to another set of rows

问题 I'm using Hive for batch-processing of my spatial database. My trace table looks something like this: object | lat | long | timestamp 1 | X11 | X12 | T11 1 | X21 | X22 | T12 2 | X11 | X12 | T21 1 | X31 | X22 | T13 2 | X21 | X22 | T22 I want to map each lat long of each object to a number (think about map-matching for example), but the algorithm needs to consider a number of adjacent data points to get the result. For example, I need all 3 data points of object 1 to map each of those 3 data

hive Expression Not In Group By Key

阅读更多关于 hive Expression Not In Group By Key

I create a table in HIVE. It has the following columns: id bigint, rank bigint, date string I want to get avg(rank) per month. I can use this command. It works. select a.lens_id, avg(a.rank) from tableA a group by a.lens_id, year(a.date_saved), month(a.date_saved); However, I also want to get date information. I use this command: select a.lens_id, avg(a.rank), a.date_saved from lensrank_archive a group by a.lens_id, year(a.date_saved), month(a.date_saved); It complains: Expression Not In Group By Key The full error message should be in the format Expression Not In Group By Key [value] . The

metastore_db created wherever I run Hive

阅读更多关于 metastore_db created wherever I run Hive

Folder metastore_db is created in any directory where I run Hive query. Is there any way to have only one metastore_db in a defined location and stop it from being created all over the places? Does it have anything to do with hive.metastore.local ? Mark Grover The property of interest here is javax.jdo.option.ConnectionURL . The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true . This value specifies that you will be using embedded derby as your Hive metastore and the location of the metastore is metastore_db . Also the metastore will be created if it doesn't