Hadoop HPROF profiling no CPU SAMPLES written

自古美人都是妖i 提交于 2020-01-04 03:49:26

问题


I want to use HPROF to profile my Hadoop job. The problem is that I get TRACES but there is no CPU SAMPLES in the profile.out file. The code that I am using inside my run method is:

    /** Get configuration */
    Configuration conf = getConf();
    conf.set("textinputformat.record.delimiter","\n\n");
    conf.setStrings("args", args);

    /** JVM PROFILING */
    conf.setBoolean("mapreduce.task.profile", true);
    conf.set("mapreduce.task.profile.params", "-agentlib:hprof=cpu=samples," +
       "heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s");
    conf.set("mapreduce.task.profile.maps", "0-2");
    conf.set("mapreduce.task.profile.reduces", "");

    /** Job configuration */
    Job job = new Job(conf, "HadoopSearch");
    job.setJarByClass(Search.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(NullWritable.class);

    /** Set Mapper and Reducer, use identity reducer*/
    job.setMapperClass(Map.class);
    job.setReducerClass(Reducer.class);

    /** Set input and output formats */
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    /** Set input and output path */
    FileInputFormat.addInputPath(job, new Path("/user/niko/16M"));  
    FileOutputFormat.setOutputPath(job, new Path(cmd.getOptionValue("output")));

    job.waitForCompletion(true);

    return 0;

How do I get the CPU SAMPLES to be written in the output?

I also have s trange error message on the stderr but I think it is not related, since it is present also when the profiling is set to false or the code for enabling profiling is commented out. The error is

 log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

回答1:


Yarn (or MRv1) is killing the container just after your job finish. The CPU Samples can't be wrote on your profiling file. In fact, your traces should be truncated also.

You have to add the following option (or the equivalent on your Hadoop version) :

yarn.nodemanager.sleep-delay-before-sigkill.ms = 30000
# No. of ms to wait between sending a SIGTERM and SIGKILL to a container

yarn.nodemanager.process-kill-wait.ms = 30000
# Max time to wait for a process to come up when trying to cleanup a container

mapreduce.tasktracker.tasks.sleeptimebeforesigkill = 30000
# Same en MRv1 ?

(30 sec seems to enough)




回答2:


This is probably caused by https://issues.apache.org/jira/browse/MAPREDUCE-5465, which is fixed in newer Hadoop versions.

So solutions seem to be:

  • use settings mentioned in ALSimon's answer, OR
  • upgrade to Hadoop >= 2.8.0


来源:https://stackoverflow.com/questions/25983999/hadoop-hprof-profiling-no-cpu-samples-written

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!