apache-pig

Error executing shell command in pig script

空扰寡人 提交于 2019-12-23 15:34:59
问题 I have a pig script where in the beginning I would like to generate a string of the dates of the past 7 days from a certain date (later used to retrieve log files for those days). I attempt to do this with this line: %declare CMD7 input= ; for i in {1..6}; do d=$(date -d "$DATE -i days" "+%Y-%m-%d"); input="\$input\$d,"; done; echo \$input I get an error : " ERROR 2999: Unexpected internal error. Error executing shell command: input= ; for i in {1..6}; do d=$(date -d "2012-07-10 -i days" "+%Y

Getting an error on running HCatalog

被刻印的时光 ゝ 提交于 2019-12-23 11:31:33
问题 A = LOAD 'eventnew.txt' USING HCatalogLoader(); 2015-07-08 19:56:34,875 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve HCatalogLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /home/KS5023833/pig_1436364102374.log Then I tried A = LOAD 'xyz' USING org.apache.hive.hcatalog.pig.HCatLoader(); This is also not working. 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatLoader using imports: [, java

how to load a tarball to pig

大兔子大兔子 提交于 2019-12-23 09:59:51
问题 i have a log files that is in a tarball (access.logs.tar.gz) loaded into my hadoop cluster. I was wondering is their way to directly load it to pig with out untaring it? 回答1: PigStorage will recognize the file is compressed (by the .gz extension, this is actually implemented in the TextInputFormat which PigTextInputFormat extends), but after that you'll be dealing with a tar file. If you're able to handle the header lines between the files in the tar then you can just use PigStorage as is,

Apache pig script, Error 1070: Java UDF could not resolve import

拈花ヽ惹草 提交于 2019-12-23 05:59:29
问题 I am trying to write a Java UDF with the end goal of extending/overriding the load method of PigStorage to support entries that take multiple lines. My pig script is as follows: REGISTER udf.jar; register 'userdef.py' using jython as parser; A = LOAD 'test_data' USING PigStorage() AS row:chararray; C = FOREACH A GENERATE myTOKENIZE.test(); DUMP D; udf.jar looks like: udf/myTOKENIZE.class myTOKENIZE.java imports org.apache.pig.* ande extends EvalFunc. the test method just returns a Hello world

Finding the difference between start_times and end_times in PIG

怎甘沉沦 提交于 2019-12-23 05:25:31
问题 Could anyone please tell me how to find the difference between two times in PIG... For e.g., Below are the sample Start_Times and End_Times, I need to find the difference between Start_Time and End_Time in PIG. 12:31:38,14:54:04 10:18:34,13:30:56 13:37:43,15:18:57 08:15:10,11:28:17 Thanks in Advance... 回答1: Couldn't find a straightforward way. Here is a workaround: t = LOAD ' input/data' USING PigStorage(',') as (time1:chararray,time2:chararray); u = FOREACH t GENERATE SecondsBetween(ToDate

Datetime parsing in Apache Pig

て烟熏妆下的殇ゞ 提交于 2019-12-23 05:00:30
问题 I'm trying to parse a Date in a Pig script and i got the following error "Hadoop does not return any error message". Here is the Date format example : 3/9/16 2:50 PM And here is how I parse it : data = LOAD 'cleaned.txt' AS (Date, Block, Primary_Type, Description, Location_Description, Arrest, Domestic, District, Year); times = FOREACH data GENERATE ToDate(Date, 'M/d/yy h:mm a') As Time; You can see the data file here Do you have any idea ? Thanks EDIT: It look like the error is caused by the

Pig Latin Word Count

允我心安 提交于 2019-12-23 04:45:42
问题 I am trying to count number of lines that contain the following words: 'jack', 'hack', 'mat', 'throttle' in a pig script. I am using Cloudera quickstart vm. The input file is: 09-jack-17,5:00PM;#slowmotion,Tribune Logic hack: how is life in temrs of money Creative hack 14-June-18,7:15PM;#Indiacalling,Horton-NJ Strategic/Halloween One World at Application Deployment 12-jack-16,jfh:er;#temporary, accomodation, osteoporosis, juxtapose, don't misinterpret this awaiting throttle jack The output

how to convert UTC time to IST using pig

喜夏-厌秋 提交于 2019-12-23 04:28:29
问题 I have a machine data comes into hdfs as below , the 8th field is UTC time(060037) , i need to convert it into IST and make the time format as hh:mm:ss using pig VTS,01,0097,9739965515,NM,GP,20,060037,V,0000.0000,N,00000.0000,E,0.0,0.0,061114,0068,00,4000,00,999,149,9594 VTS,01,0097,9739965515,SP,GP,33,060113,V,0000.0000,N,00000.0000,E,0.0,0.0,061114,0068,00,4000,00,999,152,B927 using string function i tried to convert it into a unix date format now i am getting time like 2014-11-06 06:01:13

JSON Array field handling in Elephant-Bird UDF in PIG

我只是一个虾纸丫 提交于 2019-12-23 03:29:37
问题 A quick question on the JSON handling in PIG. I tried some JsonLoader called Elephant-Bird to load and handle JSON data like the followings: { "SV":1, "AD":[ { "ID":"46931606", "C1":"46", "C2":"469", "ST":"46931", "PO":1 }, { "ID":"46721489", "C1":"46", "C2":"467", "ST":"46721", "PO":5 } ] } The loader works well for simple fields but it doesn't work well for any array field. I don't know how I can access elements in the array ("AD" field above) with this UDF or in any other way? Please

replacing values in pig latin

感情迁移 提交于 2019-12-23 02:49:09
问题 I have a dataset in form: id1, id2, id3 Either of id1,id2 or id3 (or all three.. or any two) can be missing in a record. Now if id1 is missing I want to replace it with 1 id2 by 3 id3 by 7 How do I do this. Thanks 回答1: Use the bincond operator to test if the value is null and then replace it with the desired value. From Programming Pig, Chapter 5: 2 == 2 ? 1 : 4 --returns 1 2 == 3 ? 1 : 4 --returns 4 null == 2 ? 1 : 4 -- returns null 2 == 2 ? 1 : 'fred' -- type error, both values must be of