Run pig in java without embedding pig script

早过忘川 提交于 2019-12-18 05:13:44

问题


I am new to pig script, Hadoop, Hbase. Here's what i need to know. I wanted to run a pig script, I don't want to embed the pig script in my java program and wanted to run it through any Pig Execution methods passing the necessary pig script and parameters (possibly parameter file). Does the core pig library or any other library provides that way to execute a pig script. I already tried with java run-time exec method, I pass some parameters with space separated strings so i dropped calling pig grunt command through run-time exec method since it is not the proper way to execute pig commands.


回答1:


You can use org.apache.pig.PigServer to run pig scripts from Java programs.

PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
pigServer.registerScript("scripts/test.pig");

Requires 'pig.properties' on classpath.

fs.default.name=hdfs://<namenode-hostname>:<port>
mapred.job.tracker=<jobtracker-hostname>:<port>

Or pass an instance of java.util.Properties to PigServer constructor.

Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://<namenode-hostname>:<port>");
props.setProperty("mapred.job.tracker", "<jobtracker-hostname>:<port>");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);



回答2:


I am not sure I understand what your are asking. Do you want to know how to run a Pig script from a Java program?

If so we use the class org.apache.pig.PigRunner for this.

PigStats pigStats = PigRunner.run(args, null);

Its Javadoc states:

A utility to help run PIG scripts within a Java program.

However from my experience Pig is not really intended to be used in this way (at least in version 0.8). We have had problems, like FileStreams that are left open and temporary files that are not deleted.




回答3:


Since others have well explained pig execution by embeding the same in java, let me just add on how to run parametrised pig without java.

In this scenarion, all you need is your pig lines of code saved as a pig file, say myFirstPigScript.pig.

The next thing that you need is parameters within. Well here is the way to run your myFirstPigScript.pig with three input parameters.

pig -p in1=file1.txt -p in2=file2.txt -p outdirectory=outdirectory myFirstPigScript.pig 

Your pig script will look like

A = load '$in1' USING PigStorage(',') AS (id_one:chararray,file1field1:chararray); 
B = load '$in2' USING PigStorage(',') AS (id_two:chararray,file2field1:chararray); 
C = join A by id_one, B by id_two;
store D into '$outdirectory' USING PigStorage(',') ;

Sample input files will be a two column csv file

Output 'part' files will be present in the outdirectory



来源:https://stackoverflow.com/questions/11152068/run-pig-in-java-without-embedding-pig-script

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!