apache-pig

Hadoop - Load Hive tables using PIG

北城以北 提交于 2019-12-13 07:56:04
问题 I want to load Hive tables using Pig. I think we can do this through HCatLoader but I am using xml files to load pig. For this, I have to use XMLLoader . Can I use two options to load XML files in Pig. I am extracting data from XML files using my own UDF and once we extract all the data, I have to load Pig data in Hive tables. I can't use HIVE to extract the XML data as the XML I received is quite complex and I wrote my own UDF to parse the XML. Any suggestions or pointers how we can load

Pig ORDER command fails

风流意气都作罢 提交于 2019-12-13 07:45:47
问题 I am trying to analyze an apache log and the goal is the find out all user agents and their percentage in usage. The following program works fine to the line when result contains each useragent, count and percentage. The program fails at last line when tries to order according to most used. Could someone help? logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent); uarows = FOREACH logs

Pig: Create json file with actual key_name and values

折月煮酒 提交于 2019-12-13 07:29:30
问题 I have a pig script using elephant bird json loader. data_input = LOAD '$DATA_INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map []); x = FOREACH data_input GENERATE json#'user__id_str', json#'user__created_at', json#'user__notifications', json#'user__follow_request_sent', json#'user__friends_count', json#'user__name', json#'user__time_zone', json#'user__profile_background_color', json#'user__is_translation_enabled', json#'user__profile_link_color', json#'user__utc

How to process a flat file with JSON string as a part of each line, into CSV file Using PIG Loader?

馋奶兔 提交于 2019-12-13 07:07:27
问题 I have a file in HDFS as 44,UK,{"names":{"name1":"John","name2":"marry","name3":"stuart"},"fruits":{"fruit1":"apple","fruit2":"orange"}},31-07-2016 91,INDIA,{"names":{"name1":"Ram","name2":"Sam"},"fruits":{}},31-07-2016 and want to store this into a SCV file as below using PIG loader : 44,UK,names,name1,John,31-07-2016 44,UK,names,name2,Marry,31-07-2016 .. 44,UK,fruit,fruit1,apple,31-07-2016 .. 91,INDIA,names,name1,Ram,31-07-2016 .. 91,INDIA,null,null,Ram,31-07-2016 What should be the PIG

Error getting when passing parameter through pig script

最后都变了- 提交于 2019-12-13 06:38:32
问题 When I'm trying to invoke pig script with property file then I'm getting error: pig -P /mapr/ANALYTICS/apps/PigTest/pig.properties -f pig_if_condition.pig SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/mapr/hbase/hbase-0.98.4/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html

pig join with java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer

£可爱£侵袭症+ 提交于 2019-12-13 06:13:15
问题 I have two files, in data1 1 3 1 2 5 1 In data2 2 3 2 4 I then tried to read them into pig d1 = LOAD 'data1'; d2 = foreach d1 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int); d3 = LOAD 'data2' ; d4 = foreach d3 generate flatten(STRSPLIT($0, ' +')) as (f1:int,f2:int); data = join d2 by f1, d4 by f2; Then I got 2013-08-04 00:48:26,032 [Thread-21] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005 java.lang.ClassCastException: java.lang.String cannot be cast to java.lang

REGEX_EXTRACT error in PIG

时光毁灭记忆、已成空白 提交于 2019-12-13 05:53:44
问题 I have a CSV file with 3 columns: tweetid , tweet , and Userid . However within the tweet column there are comma separated values. i.e. of 1 row of data: `396124437168537600`,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143 I want to extract all 3 fields individually, but REGEX_EXTRACT is giving me an error with this code: a = LOAD tweets USING PigStorage(',') AS (f1,f2,f3); b = FILTER a BY REGEX_EXTRACT(f1,'(

Loading unstructered data with different delimiters in Pig using PigLatin only

我的梦境 提交于 2019-12-13 05:49:20
问题 Hi I am trying to load the following data (inculdes different delimiters and is unstructered) into Pig using PigLatin only, without preparing the data with i.e. Java. Input: 1234 #one,#two,#three 5679 #one,#two 1234 #one Output what I am looking for: 1234 #one 1234 #two 1234 #three 5678 #one 5678 #two 1234 #one Any ideas? Is this even possible in Pig? Thanks a lot in advance! 回答1: Pig Script : A = LOAD 'a.csv' AS USING PigStorage(' ') (key:chararray, value:chararray); B = FOREACH A GENERATE

Loading datetime format files using PIG

天涯浪子 提交于 2019-12-13 05:34:17
问题 I have a dataset in the following way. ravi,savings,avinash,2,char,33,F,22,44,12,13,33,44,22,11,10,22,2006-01-23 avinash,current,sandeep,3,char,44,M,33,11,10,12,33,22,39,12,23,19,2001-02-12 supreeth,savings,prabhash,4,char,55,F,22,12,23,12,44,56,7,88,34,23,1995-03-11 lavi,current,nirmesh,5,char,33,M,11,10,33,34,56,78,54,23,445,66,1999-06-15 Venkat,savings,bunny,6,char,11,F,99,12,34,55,33,23,45,66,23,23,2016-05-18 the last column(example:2006-01-23) is date. I am trying to load the above data

Use Pig to Denormalize A Large Data Frame

时光怂恿深爱的人放手 提交于 2019-12-13 05:08:15
问题 I have a large-ish (21GB) tab-delimited data frame of the form DOCID_1 TERMID_1 TITLE_1 YEAR_1 AUTHOR_1 DOCID_1 TERMID_2 TITLE_1 YEAR_1 AUTHOR_1 ... DOCID_n TERMID_n TITLE_n YEAR_n AUTHOR_n That is, a (DOCID, TERMID) pair will always uniquely identify a row. What I need, is a data frame in which a DOCID alone uniquely identifies a row, and the TERMIDs are collapsed into a comma-separated chararray list. For example, DOCID_1 TERMID_11, TERMID_12, ..., TERMID_n TITLE_1 YEAR_1 AUTHOR_1 ... DOCID