Setting hadoop parameters with boto?

后端 未结 1 1652
猫巷女王i
猫巷女王i 2020-12-31 18:48

I am trying to enable bad input skipping on my Amazon Elastic MapReduce jobs. I am following the wonderful recipe described here:

http://devblog.factual.com/practica

相关标签:
1条回答
  • 2020-12-31 19:15

    After many hours of struggling, reading code, and experimentation, here is the answer:

    You need to add a new BootstrapAction, like so:

    params = ['-s','mapred.skip.mode.enabled=true',
              '-s', 'mapred.skip.map.max.skip.records=1',
              '-s', 'mapred.skip.attempts.to.start.skipping=2',
              '-s', 'mapred.map.max.attempts=5',
              '-s', 'mapred.task.timeout=100000']
    config_bootstrapper = BootstrapAction('Enable skip mode', 's3://elasticmapreduce/bootstrap-actions/configure-hadoop', params)
    
    conn = EmrConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    step = StreamingStep(name='My Step', ...)
    conn.run_jobflow(..., bootstrap_actions=[config_bootstrapper], steps=[step], ...)
    

    Of course, if you have more than one bootstrap action, you should just add it to the bootstrap_actions array.

    0 讨论(0)
提交回复
热议问题