问题
How to format dates during the process of creating Hive tables?
I've currently been dumping some data into a discovery environment at work and storing dates as string, because if I format them as a DATE or TIMESTAMP the values are null.
Here's what the raw data looks like:
12/07/2016 05:07:28 PM
My understanding is that Hive accepts dates in this format
yyyy-mm-dd hh:mm:ss
I can format these using a select statement:
select id, receipt_dt, from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as app_dt from MySchema.MyTable where app_num='123456'
How can I add in the statement
from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd')
How can I add this in to the generic CREATE EXTERNAL STATEMENT below so that I no longer have to store dates as a string, or use an ALTER TABLE statement to change the formatting?
CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table
( Field1 Format,
Field2 Format,
Field 3 Format,
)
.......
回答1:
Use MyTable as staging table with raw data and create final/target table my_new_table with transformations i.e, date format...it will be EDW kind of process...
example:
CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table
( Field1 int,
Field2 string,
Field3 date
)
... more definitions....
AS
select id, receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as date) as app_dt
from MySchema.MyTable ;
NOTE: This is not tested statement. You may need to try and edit and try...but you got the idea...
Then inserting delta should be similar process...
INSERT INTO TABLE MySchema.My_New_Table
AS
select id, receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as date) as app_dt
from MySchema.MyTable where <<conditions>>;
来源:https://stackoverflow.com/questions/41400094/hadoop-formatting-dates-when-creating-tables