I want to run a hadoop job remotely from a windows machine. The cluster is running on Ubuntu.
Basically, I want to do two things:
Welcome to a world of pain. I've just implemented this exact use case, but using Hadoop 2.2 (the current stable release) patched and compiled from source.
What I did, in a nutshell, was:
sudo ldconfig, see this post.hadoop-2.2.0-src/hadoop-dist/target on the server node(s) and configure it. I can't help you with that since you need to tweak it to your cluster topology.c:\java\jdk1.7.JAVA_HOME, HADOOP_HOME and PATH environment variables as described in these instructionsunix2dos (from Cygwin or standalone) to convert all .cmd files in the bin and etc\hadoop directories, otherwise you'll get weird errors about labels when running them.fs.default.name, mapreduce.jobtracker.address, yarn.resourcemanager.hostname and the alike.If you've managed all of that, you can start your Linux Hadoop cluster and connect to it from your Windows command prompt. Joy!