问题
I want to copy output of job from EMR cluster to Amazon S3 pro-grammatically.
How to use S3DistCp in java code to do the same.
回答1:
hadoop ToolRunner can run this.. since S3DistCP extends Tool
Below is the usage example:
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.util.ToolRunner;
import com.amazon.external.elasticmapreduce.s3distcp.S3DistCp
public class CustomS3DistCP{
private static final Log log = LogFactory.getLog(CustomS3DistCP.class);
public static void main(String[] args) throws Exception {
log.info("Running with args: " + args);
System.exit(ToolRunner.run(new S3DistCp(), args));
}
you have to have s3distcp jar in your classpath You can call this program from a shell script.
Hope that helps!
来源:https://stackoverflow.com/questions/18124845/how-to-use-s3distcp-in-java-code