apache-pig | 易学教程

Reading a file in javascript via Apache Pig UDF

阅读更多关于 Reading a file in javascript via Apache Pig UDF

问题 I have some (very simplified) nodejs code here: var fs = require('fs'); var derpfile = String(fs.readFileSync( './derp.txt', 'utf-8' )); var derps = derpfile.split( '\n' ); for (var i = 0; i < derps.length; ++i) { // do something with my derps here } The problem is, I cannot use node in Pig UDF's (that I am aware of; if I can do this, please let me know!). When I look at 'file io' in javascript, all the tutorials I see are in re the browser sandbox. I need to read a file off the filesystem,

ElephantBird ERROR 1070: — > class not getting read

阅读更多关于 ElephantBird ERROR 1070: — > class not getting read

问题 My problem is similar to this unanswered question : [https://stackoverflow.com/questions/42140344/elephantbird-dependency-jars][1] i have registered all jars mandatory for elephantbird to function. REGISTER '/MyJARS/elephant-bird-hadoop-compat-4.1 REGISTER '/MyJARS/json-simple-1.1.jar'; REGISTER '/MyJARS/elephant-bird-pig-4.1.jar'; REGISTER '/MyJARS/elephant-bird-core-4.10.jar'; REGISTER '/MyJARS/google-collections-1.0.jar'; following links tell me these info : 1 : Loading data from HDFS does

Apache Pig - nested FOREACH over same relation

阅读更多关于 Apache Pig - nested FOREACH over same relation

问题 I have a number of bags and I want to compute the pairwise similarities between the bags. sequences = FOREACH raw GENERATE gen_bag(logs); The relation is described as follows: sequences: {t: (type: chararray, value:charray)} The similarity is computed by a Python UDF that takes two bags as arguments. I have tried to do a nested foreach over the sequences variable, but I cant loop over the same relation twice. I've also tried to define the sequences twice, but I cant access the copy in the

Trouble running pig in both local or mapreduce mode

阅读更多关于 Trouble running pig in both local or mapreduce mode

问题 I already have Hadoop 1.2 running on my Ubuntu VM which is running on Windows 7 machine. I recently installed Pig 0.12.0 on my same Ubuntu VM. I have downloaded the pig-0.12.0.tar.gz from the apache website. I have all the variables such as JAVA_HOME, HADOOP_HOME, PIG_HOME variables set correctly. When I try to start pig in local mode this is what I see: chandeln@ubuntu:~$ pig -x local pig: invalid option -- 'x' usage: pig chandeln@ubuntu:~$ echo $JAVA_HOME /usr/lib/jvm/java7 chandeln@ubuntu:

Apache Pig: Dynamic columns

阅读更多关于 Apache Pig: Dynamic columns

问题 I've a dataset (CSV) that has three value columns (v1, 2 and 3) with a value. The description of the value is stored as a comma separated string in the column 'keys'. | v1 | v2 | v3 | keys | | A | C | E | X,Y,Z | Using Pig I would like to load this information in a HBase table where the Column Family is C and the Column Qualifier is the key. | C:X | C:Y | C:Z | | A | C | E | Has anyone done this before and would like to share this knowledge? Another option is to store a map (key#value) in a

hadoop pig joining on any matching tuple values

阅读更多关于 hadoop pig joining on any matching tuple values

问题 I'm new to pig and trying to use it to process a dataset. I have a set of records that looks like id elements -------------- 1 ["a","b","c"] 2 ["a","f","g"] 3 ["f","g","h"] The idea is that I want to create tuples of elements that have any overlapping elements. If elements was just a single item instead of array, I could do a simple join like: A = LOAD 'mydata' ... B = FOREACH A GENERATE id as id_2, elements as elements_2; C = JOIN A BY elements, B BY elements_2; But since elements is an

Error from Json Loader in Pig

阅读更多关于 Error from Json Loader in Pig

问题 I have got below error while writing json scripts.. Please let me know how to write json loader script in pig. script: x = LOAD 'hdfs://user/spanda20/pig/phone.dat' USING JsonLoader('id:chararray, phone:(home:{(num:chararray, city:chararray)})'); Data set: { "id": "12345", "phone": { "home": [ { "zip": "23060", "city": "henrico" }, { "zip": "08902", "city": "northbrunswick" } ] } } 2015-03-18 14:24:10,917 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer

Pig Store the file with custom row/record delimiter

阅读更多关于 Pig Store the file with custom row/record delimiter

问题 I Have a file with a ctrlB as a record delimiter. I was able to read the file in pig by over-writing LoaderInputFormat class and getInputFormat() method in pig storage. But I was not able to store the file with ctrlB as a record delimiter. 回答1: Read ctrl+b delimited record SET textinputformat.record.delimiter '\n' x= LOAD 'xyz' USING PigStorage('\u0002'); Write ctrl+b delimited record- store x into 'y' using PigStorage('\u0002'); 来源： https://stackoverflow.com/questions/38776692/pig-store-the

Is it possible to detect and handle string collisions among grouped values when grouping in Hadoop Pig?

阅读更多关于 Is it possible to detect and handle string collisions among grouped values when grouping in Hadoop Pig?

问题 Assuming I have lines of data like the following that show user names and their favorite fruits: Alice\tApple Bob\tApple Charlie\tGuava Alice\tOrange I'd like to create a pig query that shows the favorite fruit of each user. If a user appears multiple times, then I'd like to show "Multiple". For example, the result with the data above should be: Alice\tMultiple Bob\tApple Charlie\tGuava In SQL, this could be done something like this (although it wouldn't necessarily perform very well): select

How is Hadoop-3.0.0 's compatibility with older versions of Hive, Pig, Sqoop and Spark

阅读更多关于 How is Hadoop-3.0.0 's compatibility with older versions of Hive, Pig, Sqoop and Spark

问题 We are currently using hadoop-2.8.0 on a 10 node cluster and are planning to upgrade to latest hadoop-3.0.0 . I want to know whether there will be any issue if we use hadoop-3.0.0 with an older version of Spark and other components such as Hive, Pig and Sqoop. 回答1: Latest Hive version does not support Hadoop3.0.It seems that Hive may be established on Spark or other calculating engines in the future. 来源： https://stackoverflow.com/questions/47920005/how-is-hadoop-3-0-0-s-compatibility-with