Use bootstrap to replace default jar on EMR

↘锁芯ラ 提交于 2019-12-07 07:20:27

With EMR 4.0 the hadoop installation path changed. So the manual update of guava-14.0.1.jar must be changed to:

cd /usr/lib/hadoop/lib
sudo wget http://central.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar
sudo rm guava-11.0.2.jar

The boostrap Action in the Answer from Sandesh doesn't work for us.

Edit:

Now we got a solution for EMR 4.0. You have to provide a spark-config.json in S3 which sets the extra ClassPath for both the Spark Executor and Driver. In the "Edit software settings (optional)" section you can define the location of this config file and load it from S3.

spark-config.json

[
  {
  "classification":"spark",
  "properties":{
    "maximizeResourceAllocation":"true"
    }
  },
  {
  "classification":"spark-defaults",
  "properties":{
    "spark.executor.extraClassPath":"/home/hadoop/lib/guava-14.0.1.jar",
    "spark.driver.extraClassPath":"/home/hadoop/lib/guava-14.0.1.jar",
    }
  }
]

The guava-14.0.1.jar needs to be downloaded via the boostrap script: guava_download.sh

#!/bin/bash
mkdir -p /home/hadoop/lib/
cd /home/hadoop/lib/
wget https://repo1.maven.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.jar

Yes , you can add bootstrap script to do this. create a shell script and upload it s3 and then use the path for script in bootstrap action for EMR.

e.g you can keep guava-14.0.1.jar in s3 bucket and download it

#!/bin/bash
hadoop fs -copyToLocal s3n://rootbucket/myjars/guava-14.0.1.jar /home/hadoop/share/hadoop/common/lib/
rm -rf /home/hadoop/share/hadoop/common/lib/guava-11.0.2.jar

I assume you are doing it as you have some dependency from with 14.0.1 jar from your map reduce code. You can build fat jar with guava-14.0.1.jar added and upload the jar as your custom jar to run you job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!