I have created a small Python script using paramiko that allows me to run MapReduce jobs without using PuTTY or cmd windows to initiate the jobs. This works great, except that I don't get to see stdout until the job completes. How can I set this up so that I can see each line of stdout as it is generated, just as I would be able to via cmd window?
Here is my script:
import paramiko # Define connection info host_ip = 'xx.xx.xx.xx' user = 'xxxxxxxxx' pw = 'xxxxxxxxx' # Commands list_dir = "ls /nfs_home/appers/cnielsen -l" MR = "hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar -files /nfs_home/appers/cnielsen/product_lookups.xml -file /nfs_home/appers/cnielsen/Mapper.py -file /nfs_home/appers/cnielsen/Reducer.py -mapper '/usr/lib/python_2.7.3/bin/python Mapper.py test1' -file /nfs_home/appers/cnielsen/Process.py -reducer '/usr/lib/python_2.7.3/bin/python Reducer.py' -input /nfs_home/appers/extracts/*/*.xml -output /user/loc/output/cnielsen/test51" getmerge = "hadoop fs -getmerge /user/loc/output/cnielsen/test51 /nfs_home/appers/cnielsen/test_010716_0.txt" client = paramiko.SSHClient() client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) client.connect(host_ip, username=user, password=pw) ##stdin, stdout, stderr = client.exec_command(list_dir) ##stdin, stdout, stderr = client.exec_command(getmerge) stdin, stdout, stderr = client.exec_command(MR) print "Executing command..." for line in stdout: print '... ' + line.strip('\n') for l in stderr: print '... ' + l.strip('\n') client.close()