apache-pig

Remove brackets and commas in output from Pig

孤者浪人 提交于 2019-12-24 03:57:04
问题 Currently my output is as below: ((130,1)) ((131,1)) ((132,1)) ((133,1)) ((137,1)) ((138,2)) ((139,1)) ((140,1)) ((142,2)) ((143,1)) I want to have it like: 130 1 131 1 132 1 My code is given below: A = LOAD 'user-links-small.txt' AS (user_a: int, user_b: int); B = ORDER A BY user_a; grouped = COGROUP B BY user_a; C = FOREACH grouped GENERATE COUNT(B); D = COGROUP C BY $0; E = FOREACH D GENERATE($0, COUNT($1)); DUMP E; I was looking through these forums, and some suggested that the way to

pig - transform data from rows to columns while inserting placeholders for non-existent fields in specific rows

醉酒当歌 提交于 2019-12-24 03:53:17
问题 Suppose I have the following flat file on HDFS (let's call this key_value): 1,1,Name,Jack 1,1,Title,Junior Accountant 1,1,Department,Finance 1,1,Supervisor,John 2,1,Title,Vice President 2,1,Name,Ron 2,1,Department,Billing Here is the output I'm looking for: (1,1,Department,Finance,Name,Jack,Supervisor,John,Title,Junior Accountant) (2,1,Department,Billing,Name,Ron,,,Title,Vice President) In other words, the first two columns form a unique identifier (similar to a composite key in db

How to use .jar in a pig file

安稳与你 提交于 2019-12-24 02:38:05
问题 I have two input files smt.txt and smo.txt. The jar file reads the text files and split the data according to some rule which is described in java file. And the pig file takes these data put into output files with doing mapreduce. register 'maprfs:///user/username/fl.jar'; DEFINE FixedLoader fl(); mt = load 'maprfs:///user/username/smt.txt' using FixedLoader('-30','30-33',...........) AS (.........); mo = load 'maprfs:///user/username/smo.txt*' using FixedLoader('-30','30-33',.....) AS (.....

elephantbird registered still showing error 2998

a 夏天 提交于 2019-12-24 01:57:05
问题 grunt> register '/home/piyush/Desktop/pro/json-simple-1.1.1.jar' grunt> register '/home/piyush/Desktop/pro/elephant-bird-pig-4.1.jar' grunt> register '/home/piyush/Desktop/pro/elephant-bird-hadoop-compat-4.1.jar' grunt> register '/home/piyush/Desktop/pro/elephant-bird-core-4.1.jar' grunt> load_tweets = LOAD '/home/piyush/Desktop/pro/quattr.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap; 2017-01-26 07:16:29,631 [main] ERROR org.apache.pig.tools.grunt.Grunt -

Pig - how to iterate on a bag of maps

时光怂恿深爱的人放手 提交于 2019-12-24 01:55:08
问题 Let me explain the problem. I have this line of code: u = FOREACH persons GENERATE FLATTEN($0#'experiences') as j; dump u; which produces this output: ([id#1,date_begin#12 2012,description#blabla,date_end#04 2013],[id#2,date_begin#02 2011,description#blabla2,date_end#04 2013]) ([id#1,date_begin#12 2011,description#blabla3,date_end#04 2012],[id#2,date_begin#02 2010,description#blabla4,date_end#04 2011]) Then, when I do this: p = foreach u generate j#'id', j#'description'; dump p; I have this

PIG: How to remove '::' in the column name

若如初见. 提交于 2019-12-24 00:58:27
问题 I have a pig relation like below: FINAL= {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray} I am trying to store all columns for input_md5 relation to a hive table. like all input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray not taking test_1:: type: chararray,test_2::name:chararray is there any command in pig which filters only columns of input

how to load twitter data from hdfs using pig?

ⅰ亾dé卋堺 提交于 2019-12-24 00:33:49
问题 I just streaming some twitter data using flume and cluster it into HDFS now I try to load it into pig for analysis.As the default JsonLoader function can not load the data so I search in google for some library which can load this kind of data.I found this link and follow there instruction. Here are the result REGISTER '/home/hduser/Downloads/json-simple-1.1.1.jar'; 2016-02-22 20:54:46,539 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead,

Merge two lines in Pig

老子叫甜甜 提交于 2019-12-23 22:12:37
问题 I would like to write a pig script for below query. Input is: ABC,DEF,, ,,GHI,JKL MNO,PQR,, ,,STU,VWX Output should be: ABC,DEF,GHI,JKL MNO,PQR,STU,VWX Could anyone please help me? 回答1: It will be difficult to solve this problem using native pig. One option could be download the datafu-1.2.0.jar library and try the below approach. input.txt ABC,DEF,, ,,GHI,JKL MNO,PQR,, ,,STU,VWX PigScript: REGISTER /tmp/datafu-1.2.0.jar; DEFINE BagSplit datafu.pig.bags.BagSplit(); A = LOAD 'input.txt' USING

Parse Complex JSON String in Pig

吃可爱长大的小学妹 提交于 2019-12-23 19:28:41
问题 I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. When using JsonLoader, I can do this easily by specifying the schema, as in this question. Is there any way to either have Pig figure out my schema for me, or to specify it when Pig is parsing a string? I've been using JsonStringToMap, but can't find a way to specify Schema, or to have it properly understand my JSON array is an array and not a single

How do I read in a list of bags in Pig?

拥有回忆 提交于 2019-12-23 19:06:14
问题 How do I read in a list of bags in Pig? I tried: grunt> cat sample.txt {a,b},{},{c,d} grunt> data = LOAD 'sample.txt' AS (a:bag{}, b:bag{}, c:bag{}); grunt> DUMP data ({},,) 回答1: The default method for reading data into Pig is PigStorage('\t') -- that is, it assumes your data is tab-separated. Yours is comma-separated. So you should write LOAD 'sample.txt' USING PigStorage(',') AS... . However, your data is not in proper Pig bag format. Remember that a bag is a collection of tuples. If you