Hadoop Map Reduce reference static objects

旧时模样 提交于 2019-12-19 04:15:23

问题


I have a static object in my map reduce job class that I want to initialize once (in the main method), then call a function on it in every mapping. So I have this object, MyObject that I declare as a variable:

static MyObject obj;

And in my main function, before I start the job I call:

obj = new MyObject();
obj.init();

And then in my map function I want to call:

obj.execute();

But for some reason I get a null pointer exception when I try this (it says obj is null). If I initialize it in my main function, shouldn't the mapper see it as initialized? Does the mapper see static variables?


回答1:


static object resides in memory. now your system is distributed one so object you had created is in memory of node on which your jobtracker is running not on other systems.

now you cannot pass object from job to mapper because config is written as xml, but there is a workaround, Serialize your object into JSON and then put it as string in your configuration and in mappers deserialize this json object

for job

job.getConfiguration().set("some key", "json string")

for mapper

Configuration conf = context.getConfiguration();
conf.get("some key");



回答2:


Your main() doesn't get invoked on every node, it only runs where you start up the job. In order to have access to your static object, it needs to be initialized at the instantiation of the mapper. That way the initialization will happen on every node that runs a map task.

But there may be another way to do what you're trying to accomplish, so the question is, what does this static object do?




回答3:


Since my object was really loading a library, I ended up using the distributed cache and just instantiating the object in the M/R methods.



来源:https://stackoverflow.com/questions/12935774/hadoop-map-reduce-reference-static-objects

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!