apache-pig | 易学教程

Is there an apache pig equivalent of “SHOW TABLES”?

阅读更多关于 Is there an apache pig equivalent of “SHOW TABLES”?

问题 I have a Hadoop data store I'm accessing in Pig and not a lot of documentation on it, plus I'm new to Pig, so I am looking for the Pig equivalent of "SHOW TABLES". When I have a connection to a MySQL db I can do this and get a general sense of what data is in there; I have found several tutorials but nothing on point. If not, is there some other way to orient myself to a Hadoop data store I know nothing about? ETA: This would be when running Pig in interactive mode, rather than loading a

count on group by on multiple columns and getting the original dataset

阅读更多关于 count on group by on multiple columns and getting the original dataset

问题 2, cornflakes, Regular,General Mills, 12 3, cornflakes, Mixed Nuts, Post, 14 4, chocolate syrup, Regular, Hersheys, 5 5, chocolate syrup, No High Fructose, Hersheys, 8 6, chocolate syrup, Regular, Ghirardeli, 6 7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7 Script data_grp = GROUP data BY (item, type); data_cnt = FOREACH data_grp GENERATE FLATTEN (group) AS(item, type), count(data) as total; filter_data = FILTER data_cnt BY total < 2; I now need the original data with the filter applied

how to find the pathing flow and rank them using pig or hive?

阅读更多关于 how to find the pathing flow and rank them using pig or hive?

问题 Below is the example for my use case. 回答1: You can reference this question where an OP was asking something similar. If I am understanding your problem correctly, you want to remove duplicates from the path, but only when they occur next to each other. So 1 -> 1 -> 2 -> 1 would become 1 -> 2 -> 1 . If this is correct, then you can't just group and distinct (as I'm sure you have noticed) because it will remove all duplicates. An easy solution is to write a UDF to remove those duplicates while

Could not load class when executed with -cp option

阅读更多关于 Could not load class when executed with -cp option

问题 Java not able to find the class file when executed with -cp option as below javac -cp ~/softwares/pig-0.12.0/pig-0.12.0.jar PR.java Compilation is successful. However when I run the above generated class I am getting error java -cp ~/softwares/pig-0.12.0/pig-0.12.0.jar PR Error: Could not find or load main class PR If I remove the -cp I am getting below error which is expected java PR Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/pig/PigServer at PR.runPigScript(PR

Apache Pig- ERROR 6007: “Unable to check name” message

阅读更多关于 Apache Pig- ERROR 6007: “Unable to check name” message

问题 Environment: hadoop 1.0.3, hbase 0.94.1, pig 0.11.1 I am running a pig script in Java program, I get the following error sometimes but not all the time. What the program does is it loads a file from hdfs, do some transformation and store it into hbase. My program is multi-threaded. And I've already made PigServer thread-safe and I have "/user/root" directory created in hdfs. Here is the snippet of the program and the exception I've got. Please advise. pigServer = PigFactory.getServer(); URL

Declare a comma seperated string constant

阅读更多关于 Declare a comma seperated string constant

问题 Objective : Declare a comma seperated string constant test.csv ========= a b c d e f Pig Script : %declare ACTIVE_VALUES 'a', 'b','c' ; -- Declaring constant like this using "" (double quotes) or even using escape characters (\) is resulting in a WARN message as below -- WARN org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for ACTIVE_VALUES A = LOAD 'test.csv' using PigStorage(',') AS (value:chararray); B = FILTER A BY value in ($ACTIVE_VALUES); dump B;

How to project an alias using a wildcard?

阅读更多关于 How to project an alias using a wildcard?

问题 Once I do a join A by id, B by id , I get an alias with fields A::f... , B::f.. . Is there a way to project it on only the A fields? C = join A by id, B by id; D = filter C by B::n < 1000; E = foreach D generate A::*; I get Unexpected character '*' What I want is E with the schema identical to A (i.e., describe E and describe A should print the exact same things). How do I do that? 回答1: You can use a project-range expression to get part of the way there. Unfortunately, there is no way to

How to write a Pig UDF in Scala

阅读更多关于 How to write a Pig UDF in Scala

问题 I am trying to write a Pig UDF in Scala (using Eclipse). I have added pig.jar as a library in the java build path which seems to resolve the 2 imports below: import org.apache.pig.EvalFunc import org.apache.pig.data.Tuple however I get 2 errors which I cannot resolve: org.apache.pig.EvalFunc[T] does not have a constructor value get is not a member of org.apache.pig.data.Tuple (though I am sure that Tuple has the get method) Here is the full code: package datesUDFs import org.apache.pig

Pig Udf in displaying result

阅读更多关于 Pig Udf in displaying result

问题 I am new to pig and I have written an udf in java and I have included a System.out.println statement in it. I have to know where this statement get printed while running in pig. 回答1: If you register and use this UDF in your pig script and then the output is stored in a pig log file such as stdoutlogs. 回答2: Assuming your UDF extends EvalFunc , you can use the Logger returned from EvalFunc.getLogger() . The log output should be visible in the associated Map / Reduce task that pig executes (if

What is the difference between GROUP and COGROUP in PIG?

阅读更多关于 What is the difference between GROUP and COGROUP in PIG?

问题 I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0. My commands and outputs are as follows. grunt> grpvar = GROUP C by $2, B by $2; grunt> cogrpvar = COGROUP C by $2, B by $2; grunt> describe grpvar; grpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: