Multiple Inputs with MRJob

前端未结

关注

 5  914

失恋的感觉 2020-12-29 14:58

I\'m trying to learn to use Yelp\'s Python API for MapReduce, MRJob. Their simple word counter example makes sense, but I\'m curious how one would handle an application invo

5条回答

Happy的楠姐 (楼主)

2020-12-29 15:29

The actual answer to your question is that mrjob does not quite yet support the hadoop streaming join pattern, which is to read the map_input_file environment variable (which exposes the map.input.file property) to determine which type of file you are dealing with based on its path and/or name.

You might still be able to pull it off, if you can easily detect from just reading the data itself which type it belongs to, as is displayed in this article:

http://allthingshadoop.com/2011/12/16/simple-hadoop-streaming-tutorial-using-joins-and-keys-with-python/

However that's not always possible...

Otherwise myjob looks fantastic and I wish they could add support for this in the future. Until then this is pretty much a deal breaker for me.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...