Datetime parsing in Apache Pig

て烟熏妆下的殇ゞ 提交于 2019-12-23 05:00:30

问题


I'm trying to parse a Date in a Pig script and i got the following error "Hadoop does not return any error message".

Here is the Date format example : 3/9/16 2:50 PM

And here is how I parse it :

data = LOAD 'cleaned.txt'
AS (Date, Block, Primary_Type, Description, Location_Description, Arrest, Domestic, District, Year);

times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;

You can see the data file here

Do you have any idea ? Thanks


EDIT:

It look like the error is caused by the STORE command on "times".

If I do a DUMP then I got:

ERROR 1066: Unable to open iterator for alias times

It happen only when I use the ToDate function, I have other scripts that work perfectly.


回答1:


First of all, you need to specify the loader in the LOAD statement:

USING PigStorage('\t')

I assumed that you're using tab separator. Than if you have no schema specify the schema with type!

So you're load statement will be sg like this:
data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);

For now I just use chararray type for everything, but you have to specify the type what is the right representation for you.

After this the date conversion just works fine as you wrote: (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z) (2016-03-09T23:55:00.000Z)

My test script:

data = LOAD 'SO/date2parse.txt' USING PigStorage('\t') AS (Date:chararray, Block:chararray, Primary_Type:chararray, Description:chararray, Location_Description:chararray, Arrest:chararray, Domestic:chararray, District:chararray, Year:chararray);
times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time;
DUMP times;

UPDATE: Some explanation

By the way the default loader is pig storage

PigStorage is the default load function for the LOAD operator.

but it's nicer to specify. The original issue caused by the lack of datatype

If you don't assign types, fields default to type bytearray

so the ToDate failed on the input type.



来源:https://stackoverflow.com/questions/36288819/datetime-parsing-in-apache-pig

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!