问题
I want to get the first part of my filename as a column in my Hive Table
My filename is : 20151102114450.46400_Always_1446482638967.xml
I wrote a query (below query) using regex in Hive of Microsoft Azure to get the first part of it i.e., 20151102114450
But when I run query I am getting the output as 20151102164358
select CAST(regexp_replace(regexp_replace(regexp_replace(CAST(CAST(regexp_replace(split(INPUT__FILE__NAME,'[_]')[2],'.xml','') AS BIGINT) as TimeStamp),':',''),'-',''),' ','') AS BIGINT) as VERSION
Can anyone tell me where I am going wrong and what needs to be corrected ?
回答1:
I tried this in Cloudera, hopefully it should work in Azure as well.
select from_unixtime(unix_timestamp(regexp_extract('20151102114450.46400_Always_1446482638967.xml','^(.*?)\\.'),'yyyyMMddHHmmss'),'yyyy-MMM-dd HH:mm:ss');
2015-Nov-02 11:44:50
Time taken: 19.644 seconds, Fetched: 1 row(s)
Another option:
select from_unixtime(unix_timestamp(split('20151102114450.46400_Always_1446482638967.xml','\\.')[0],'yyyyMMddHHmmss'),'yyyy-MMM-dd HH:mm:ss')
来源:https://stackoverflow.com/questions/37331487/part-of-filename-as-a-column-in-hive-table