How to run hadoop multithread way in single JVM?

穿精又带淫゛_ 提交于 2019-12-22 20:43:10

问题


I have 4 core desktop and want to use all my cores for local data processing with hadoop. (i.e. sometimes I have enough power to process data locally sometimes I submit same jobs to cluster).

By default hadoop local mode runs only one mapper and one reducer so my local jobs are really slow. I do not want to setup cluster on single machine first because of "painful" configuration and second I have to create jar each time. So perfect solution is to how run embedded Hadoop on a single machine

PS pseudo-distributed mode is bad option since it will create cluster with Single node, so I will get only one mapper and I have to spend some time on additional configuration.


回答1:


You need to use MultithreadedMapRunner - just set up it in JobConf's setMapRunnerClass method and don't forget to set mapred.map.multithreadedrunner.threads to desirable concurrency level.

Also there is an another way, you should:

  • set MultithreadedMapper as your mapper class in Job-typed object
  • call MultithreadedMapper.setMapperClass with you actual mapper class
  • call MultithreadedMapper.setNumberOfThreads with desirable concurrency level

But be careful, your mapper class should be thread safe and it's setup and cleanup methods would be called several times, so it isn't a smart idea to mix MultithreadedMapper with MultipulOutput, unless you implement you own MultithreadedMapper inspired class.




回答2:


Hadoop purposely does not run more than one task at the same time in one JVM for isolation purposes. And in stand-alone (local) mode, only one JVM is ever used. If you want to make use of your four cores, you should run in pseudo-distributed mode, and increase the max number of concurrent tasks to four. You can do this with the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties.




回答3:


    Configuration conf = new Configuration();

    Job job = new Job(conf, "SolerRandomHit");

    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(IntWritable.class);


    job.setMapperClass(MultithreadedMapper.class);


来源:https://stackoverflow.com/questions/12504690/how-to-run-hadoop-multithread-way-in-single-jvm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!