问题
I am new to PIG. From the pig wiki page i got to know that there is piggybank udf and another useful collection DataFu from Linkedin. Also i come to know that from Pig 0.8 the piggybank is part of apache Pig's builtin udfs.
but.. I think most of the piggybank UDFs are not documented in Apache Pig. Like StringConcat.
I am looking some date formatting UDFs which wil convert datetime to String like FormatDate. I am not sure we have these UDF's already in pig/piggybank as i could not find it in documentation.
Also, are there any other 3rd party udfs java/python available. Please list those.
Your help is really appreciated.
回答1:
So there's a few questions here. I'll try to cover them all.
PiggyBank Docs
There (sadly) is no user manually for piggybank UDF's that explains how to use each of them from within a pigscript. However, the Pig javadoc includes information for each java cass implementing the UDFs in piggy bank (scroll down to "contrib: Piggybank"):
- http://pig.apache.org/docs/r0.8.1/api/overview-summary.html
- http://pig.apache.org/docs/r0.9.1/api/overview-summary.html
- http://pig.apache.org/docs/r0.10.0/api/overview-summary.html
String to DateTime
(assuming pig < 0.11)
To convert a string containing time like information, you'll want to use the CustomFormatToISO UDF. This takes your chararray with data information and a datetime format specification and converts it into an ISO datetime format. Once in this format, there are several Piggybank functions that operate on ISO formatted time:
- http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/truncate/package-summary.html
- http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/piggybank/evaluation/datetime/diff/package-summary.html
Note also that ISO formatted strings comparisons result in date sorting. This means you can apply comparison and sort operations on them, and they will behave as if they are time aware. For more background see this SO answer: https://stackoverflow.com/a/9576911/9940
If you're using 0.11 plus you can use the built in ToDate() function: http://pig.apache.org/docs/r0.11.1/func.html#to-date
来源:https://stackoverflow.com/questions/19140126/pig-3rd-party-udf-clarification