apache-pig

Is there an apache pig equivalent of “SHOW TABLES”?

只愿长相守 提交于 2019-12-14 03:52:45
问题 I have a Hadoop data store I'm accessing in Pig and not a lot of documentation on it, plus I'm new to Pig, so I am looking for the Pig equivalent of "SHOW TABLES". When I have a connection to a MySQL db I can do this and get a general sense of what data is in there; I have found several tutorials but nothing on point. If not, is there some other way to orient myself to a Hadoop data store I know nothing about? ETA: This would be when running Pig in interactive mode, rather than loading a

count on group by on multiple columns and getting the original dataset

风流意气都作罢 提交于 2019-12-14 03:32:26
问题 2, cornflakes, Regular,General Mills, 12 3, cornflakes, Mixed Nuts, Post, 14 4, chocolate syrup, Regular, Hersheys, 5 5, chocolate syrup, No High Fructose, Hersheys, 8 6, chocolate syrup, Regular, Ghirardeli, 6 7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7 Script data_grp = GROUP data BY (item, type); data_cnt = FOREACH data_grp GENERATE FLATTEN (group) AS(item, type), count(data) as total; filter_data = FILTER data_cnt BY total < 2; I now need the original data with the filter applied

how to find the pathing flow and rank them using pig or hive?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-14 03:25:36
问题 Below is the example for my use case. 回答1: You can reference this question where an OP was asking something similar. If I am understanding your problem correctly, you want to remove duplicates from the path, but only when they occur next to each other. So 1 -> 1 -> 2 -> 1 would become 1 -> 2 -> 1 . If this is correct, then you can't just group and distinct (as I'm sure you have noticed) because it will remove all duplicates. An easy solution is to write a UDF to remove those duplicates while

Could not load class when executed with -cp option

喜夏-厌秋 提交于 2019-12-14 02:33:40
问题 Java not able to find the class file when executed with -cp option as below javac -cp ~/softwares/pig-0.12.0/pig-0.12.0.jar PR.java Compilation is successful. However when I run the above generated class I am getting error java -cp ~/softwares/pig-0.12.0/pig-0.12.0.jar PR Error: Could not find or load main class PR If I remove the -cp I am getting below error which is expected java PR Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/pig/PigServer at PR.runPigScript(PR

Apache Pig- ERROR 6007: “Unable to check name” message

我们两清 提交于 2019-12-13 22:50:04
问题 Environment: hadoop 1.0.3, hbase 0.94.1, pig 0.11.1 I am running a pig script in Java program, I get the following error sometimes but not all the time. What the program does is it loads a file from hdfs, do some transformation and store it into hbase. My program is multi-threaded. And I've already made PigServer thread-safe and I have "/user/root" directory created in hdfs. Here is the snippet of the program and the exception I've got. Please advise. pigServer = PigFactory.getServer(); URL

Declare a comma seperated string constant

女生的网名这么多〃 提交于 2019-12-13 18:16:12
问题 Objective : Declare a comma seperated string constant test.csv ========= a b c d e f Pig Script : %declare ACTIVE_VALUES 'a', 'b','c' ; -- Declaring constant like this using "" (double quotes) or even using escape characters (\) is resulting in a WARN message as below -- WARN org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for ACTIVE_VALUES A = LOAD 'test.csv' using PigStorage(',') AS (value:chararray); B = FILTER A BY value in ($ACTIVE_VALUES); dump B;

How to project an alias using a wildcard?

二次信任 提交于 2019-12-13 16:35:10
问题 Once I do a join A by id, B by id , I get an alias with fields A::f... , B::f.. . Is there a way to project it on only the A fields? C = join A by id, B by id; D = filter C by B::n < 1000; E = foreach D generate A::*; I get Unexpected character '*' What I want is E with the schema identical to A (i.e., describe E and describe A should print the exact same things). How do I do that? 回答1: You can use a project-range expression to get part of the way there. Unfortunately, there is no way to

How to write a Pig UDF in Scala

自闭症网瘾萝莉.ら 提交于 2019-12-13 16:13:00
问题 I am trying to write a Pig UDF in Scala (using Eclipse). I have added pig.jar as a library in the java build path which seems to resolve the 2 imports below: import org.apache.pig.EvalFunc import org.apache.pig.data.Tuple however I get 2 errors which I cannot resolve: org.apache.pig.EvalFunc[T] does not have a constructor value get is not a member of org.apache.pig.data.Tuple (though I am sure that Tuple has the get method) Here is the full code: package datesUDFs import org.apache.pig

Pig Udf in displaying result

允我心安 提交于 2019-12-13 12:16:20
问题 I am new to pig and I have written an udf in java and I have included a System.out.println statement in it. I have to know where this statement get printed while running in pig. 回答1: If you register and use this UDF in your pig script and then the output is stored in a pig log file such as stdoutlogs. 回答2: Assuming your UDF extends EvalFunc , you can use the Logger returned from EvalFunc.getLogger() . The log output should be visible in the associated Map / Reduce task that pig executes (if

What is the difference between GROUP and COGROUP in PIG?

China☆狼群 提交于 2019-12-13 11:47:25
问题 I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0. My commands and outputs are as follows. grunt> grpvar = GROUP C by $2, B by $2; grunt> cogrpvar = COGROUP C by $2, B by $2; grunt> describe grpvar; grpvar: {group: chararray,C: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: int)},B: {(pid: int,pname: chararray,drug: chararray,gender: chararray,tot_amt: