I\'m trying to learn to use Yelp\'s Python API for MapReduce, MRJob. Their simple word counter example makes sense, but I\'m curious how one would handle an application invo
The actual answer to your question is that mrjob does not quite yet support the hadoop streaming join pattern, which is to read the map_input_file environment variable (which exposes the map.input.file property) to determine which type of file you are dealing with based on its path and/or name.
You might still be able to pull it off, if you can easily detect from just reading the data itself which type it belongs to, as is displayed in this article:
http://allthingshadoop.com/2011/12/16/simple-hadoop-streaming-tutorial-using-joins-and-keys-with-python/
However that's not always possible...
Otherwise myjob looks fantastic and I wish they could add support for this in the future. Until then this is pretty much a deal breaker for me.